<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.4" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Djoerd Hiemstra's home page</title>
	<link>http://wwwhome.cs.utwente.nl/~hiemstra</link>
	<description>A bit of teaching, some research, shake well...</description>
	<pubDate>Thu, 18 Mar 2010 20:19:25 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.4</generator>
	<language>en</language>
			<item>
		<title>QueryBased Sampling: Can we do Better than Random?</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/querybased-sampling-can-we-do-better-than-random.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/querybased-sampling-can-we-do-better-than-random.html#comments</comments>
		<pubDate>Tue, 16 Mar 2010 15:13:21 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Paper abstracts</category>
	<category>Distributed Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/querybased-sampling-can-we-do-better-than-random.html</guid>
		<description><![CDATA[
by Almer Tigelaar and Djoerd Hiemstra

Many servers on the web offer content that is only accessible
via a search interface. These are part of the deep web.
Using conventional crawling to index the content of these remote
servers is impossible without some form of cooperation.
Query-based sampling provides an alternative to crawling
requiring no cooperation beyond a basic search interface. [...]]]></description>
			<content:encoded><![CDATA[
<p>by Almer Tigelaar and Djoerd Hiemstra</p>

<p>Many servers on the web offer content that is only accessible
via a search interface. These are part of the deep web.
Using conventional crawling to index the content of these remote
servers is impossible without some form of cooperation.
Query-based sampling provides an alternative to crawling
requiring no cooperation beyond a basic search interface. In
this approach, conventionally, random queries are sent to a
server to obtain a sample of documents of the underlying
collection. The sample represents the entire server content.
This representation is called a resource description. In this
research we explore if better resource descriptions can be
obtained by using alternative query construction strategies.
The results indicate that randomly choosing queries from the
vocabulary of sampled documents is indeed a good strategy.
However, we show that, when sampling a large collection,
using the least frequent terms in the sample yields a better
resource description than using randomly chosen terms.</p>

<p>[<a href="/~hiemstra/papers/tr-ctit-10-04.pdf">download pdf</a>]</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/querybased-sampling-can-we-do-better-than-random.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Learning to Merge Search Results</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/learning-to-merge-search-results.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/learning-to-merge-search-results.html#comments</comments>
		<pubDate>Tue, 16 Mar 2010 15:05:46 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Paper abstracts</category>
	<category>Distributed Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/learning-to-merge-search-results.html</guid>
		<description><![CDATA[
Learning to Merge Search Results for Efficient Distributed Information Retrieval

Kien Tjin-Kam-Jet and Djoerd Hiemstra


Merging search results from different servers is a major problem
in Distributed Information Retrieval. We used Regression-SVM and 
Ranking-SVM which learn a function
that merges results based on information that is readily
available, i.e. the ranks, titles, summaries and URLs contained
in the results pages. By [...]]]></description>
			<content:encoded><![CDATA[
<p><strong>Learning to Merge Search Results for Efficient Distributed Information Retrieval</strong></p>

<p>Kien Tjin-Kam-Jet and Djoerd Hiemstra</p>

<p>
Merging search results from different servers is a major problem
in Distributed Information Retrieval. We used Regression-SVM and 
Ranking-SVM which learn a function
that merges results based on information that is readily
available, i.e. the ranks, titles, summaries and URLs contained
in the results pages. By not downloading additional
information, such as the full document, we decrease bandwidth
usage. CORI and Round Robin merging were used as
our baselines; surprisingly, our results show that the SVM methods
do not improve over those baselines
</p>

<p>[<a href="/~hiemstra/papers/dir2010.pdf">download pdf</a>]</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/learning-to-merge-search-results.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Home Work Series 2 now on-line</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/home-work-series-2-now-on-line.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/home-work-series-2-now-on-line.html#comments</comments>
		<pubDate>Tue, 16 Mar 2010 13:39:21 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Course XML &amp; DB 1</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/home-work-series-2-now-on-line.html</guid>
		<description><![CDATA[
You can find the home work assignments for XML &#038; Databases 1 in the &#8220;Assignments&#8221; section on Blackboard.Deadline: Friday, 2 April 2010.

]]></description>
			<content:encoded><![CDATA[
<p>You can find the home work assignments for XML &#038; Databases 1 in the &#8220;Assignments&#8221; section on <a href="https://blackboard.utwente.nl/bin/common/course.pl?course_id=_1696_1">Blackboard</a>.<br /><strong>Deadline: Friday, 2 April 2010.</strong></p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/home-work-series-2-now-on-line.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Ralf Schimmel graduates on keyword suggestion</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/ralf-schimmel-graduates-on-keyword-suggestion.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/ralf-schimmel-graduates-on-keyword-suggestion.html#comments</comments>
		<pubDate>Mon, 15 Mar 2010 06:35:42 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/ralf-schimmel-graduates-on-keyword-suggestion.html</guid>
		<description><![CDATA[
Keyword Suggestion for Search Engine Marketing

by Ralf Schimmel


Every person acquainted with the web, is also a frequent user of search engines
like Yahoo and Google. Any person with a web site makes this web site with a
vision in mind, most of the times this entails being found on the web. Search
engines offer several methods to users [...]]]></description>
			<content:encoded><![CDATA[
<p><strong>Keyword Suggestion for Search Engine Marketing</strong></p>

<p>by Ralf Schimmel</p>

<p>
Every person acquainted with the web, is also a frequent user of search engines
like Yahoo and Google. Any person with a web site makes this web site with a
vision in mind, most of the times this entails being found on the web. Search
engines offer several methods to users that help them to be found. One group
of the techniques used in this field is Search Engine Optimization (SEO), which
covers everything that can be done to optimize a web site for the search engine.
The whole idea of SEO is to ensure that a web site is listed in the set of
search results once a matching query is entered by a user. A second important
part of the search engines is Search Engine Advertisement (SEA). Billions of
dollars are paid by companies that bid on keywords that match their advertisements
to a users query. These keywords are hard to find, of course a company
knows what it sells, but it does not know how the users search for the same
products or services. Advertising in search engines can be done in multiple
ways. The focus of this research lies in finding many long-tail keywords, words
that often have a low search volume, but which are cheap (low competition)
and which are often specific enough to ensure high conversion rates (a visitor
becomes a customer). Several keyword suggestion techniques are researched and
evaluated for practical use. One applicable technique is chosen, implemented
and evaluated. The chosen technique is a web based technique which is using
an undirected weighted graph of candidate terms (nodes), where the weight of
the vertices is the semantic similarity between the two nodes, and where the
term frequency of the term is stored in the node. The evaluation shows that
it is a technique capable of suggesting a lot of relevant keywords that can be used 
for search engine marketing. According to the evaluation the technique is capable of 
using the term frequencies and the semantic similarities to find and rank suggestions
based on popularity and relevance. The most important conclusion is that, for
single term suggestions, the system outperforms Google&#8217;s suggestion system.
Google&#8217;s precision on single term suggestions is better then the precision of the
new tool, however the relative recall of Google is a lot worse, for both obvious
and non-obvious single term suggestions. Currently the tool can only be used to
complement Google&#8217;s tool, however once extended with support for multi term
suggestions it can replace the entire system.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/ralf-schimmel-graduates-on-keyword-suggestion.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Beyond Shot Retrieval</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/beyond-shot-retrieval.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/beyond-shot-retrieval.html#comments</comments>
		<pubDate>Mon, 08 Mar 2010 14:20:23 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Multimedia Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/beyond-shot-retrieval.html</guid>
		<description><![CDATA[
Searching for Broadcast News Items Using Language Models of Concepts

by Robin Aly, Aiden Doherty, Djoerd Hiemstra, and Alan Smeaton

Current video search systems commonly return video shots
as results. We believe that users may better relate to longer, semantic
video units and propose a retrieval framework for news story items, which
consist of multiple shots. The framework is divided [...]]]></description>
			<content:encoded><![CDATA[
<p><strong>Searching for Broadcast News Items Using Language Models of Concepts</strong></p>

<p>by Robin Aly, Aiden Doherty, Djoerd Hiemstra, and Alan Smeaton</p>

<p>Current video search systems commonly return video shots
as results. We believe that users may better relate to longer, semantic
video units and propose a retrieval framework for news story items, which
consist of multiple shots. The framework is divided into two parts: (1)
A concept based language model which ranks news items with known
occurrences of semantic concepts by the probability that an important
concept is produced from the concept distribution of the news item and
(2) a probabilistic model of the uncertain presence, or risk, of these
concepts. In this paper we use a method to evaluate the performance of
story retrieval, based on the TRECVID shot-based retrieval groundtruth.
Our experiments on the TRECVID 2005 collection show a significant
performance improvement against four standard methods.
</p>

<p><em>The paper will be presented at the 32nd European Conference on Information Retrieval (ECIR) in Milton Keynes, UK.</em></p>

<p>[<a href="/~hiemstra/papers/ecir10story.pdf">download pdf</a>]
]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/beyond-shot-retrieval.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>New release of PF/Tijah</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/new-release-of-pftijah.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/new-release-of-pftijah.html#comments</comments>
		<pubDate>Thu, 04 Mar 2010 10:03:20 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>PF/Tijah</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/new-release-of-pftijah.html</guid>
		<description><![CDATA[
The newest stable release of MonetDB/PF/Tijah is now on-line, version 0.13.0 as part of Pathfinder 0.36.1

More info on the PF/Tijah site.

]]></description>
			<content:encoded><![CDATA[
<p>The newest stable release of MonetDB/PF/Tijah is now on-line, version 0.13.0 as part of Pathfinder 0.36.1</p>

<p>More info on the <a href="http://dbappl.cs.utwente.nl/pftijah">PF/Tijah site</a>.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/new-release-of-pftijah.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Erwin de Moel graduates on managing recorded lectures for Collegerama</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/erwin-de-moel-graduates-on-managing-recorded-lectures-for-collegerama.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/erwin-de-moel-graduates-on-managing-recorded-lectures-for-collegerama.html#comments</comments>
		<pubDate>Thu, 04 Mar 2010 10:00:41 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Multimedia Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/erwin-de-moel-graduates-on-managing-recorded-lectures-for-collegerama.html</guid>
		<description><![CDATA[
Expanding the usability of recorded lectures: A new age in teaching and classroom instruction

by Erwin de Moel


The status of recorded lectures at Delft University of Technology has been studied in order to
expand its usability in their present and future educational environment. Possibilities for the
production of single file vodcasts have been tested. These videos allow for [...]]]></description>
			<content:encoded><![CDATA[
<p><strong>Expanding the usability of recorded lectures: A new age in teaching and classroom instruction</strong></p>

<p>by Erwin de Moel</p>

<p>
The status of recorded lectures at Delft University of Technology has been studied in order to
expand its usability in their present and future educational environment. Possibilities for the
production of single file vodcasts have been tested. These videos allow for an increased
accessibility of their recorded lectures through the form of other distribution platforms.
Furthermore the production of subtitles has been studied. This was done with an ASR system
called SHoUT, developed at University of Twente, and machine translation of subtitles into
other languages. SHoUT generated transcripts always require post-processing for subtitling.
Machine translation could produce translated subtitles of sufficient quality.
Navigation of recorded lectures needs to be improved, requiring input of the lecturer.
Collected metadata from lecture chapter titles, slide data (titles, content and notes) as well as
ASR results have been used for the creation of a lecture search engine, which also produces
interactive tables of content and tag clouds for each lecture.
Recorded lectures could further be enhanced with time-based discussion boards, for the
asking and answering of questions. Further improvements have been proposed for allowing
recorded lectures to be re-used in recurring online-based courses.
</p>

<p><a href="http://eprints.eemcs.utwente.nl/17634/">Read More</a></p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/erwin-de-moel-graduates-on-managing-recorded-lectures-for-collegerama.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Google&#8217;s MapReduce patent - no threat to stuffed elephants</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/googles-mapreduce-patent-no-threat-to-stuffed-elephants.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/googles-mapreduce-patent-no-threat-to-stuffed-elephants.html#comments</comments>
		<pubDate>Tue, 23 Feb 2010 06:33:20 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Course MapReduce</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/googles-mapreduce-patent-no-threat-to-stuffed-elephants.html</guid>
		<description><![CDATA[
You now officially have a 50% chance of getting a job at Google.  


Google hired about half the students who took Bisciglia&#8217;s first class.



Read the Register&#8217;s article on MapReduce.

 
]]></description>
			<content:encoded><![CDATA[
<p>You now officially have a 50% chance of getting a job at Google. <img src='http://wwwhome.cs.utwente.nl/~hiemstra/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>

<blockquote>
Google hired about half the students who took Bisciglia&#8217;s first class.
</blockquote>

<p>
Read the <a href="http://www.theregister.co.uk/2010/02/22/google_mapreduce_patent/">Register&#8217;s article on MapReduce</a>.</p>

 
]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/googles-mapreduce-patent-no-threat-to-stuffed-elephants.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>MapReduce book by Lin and Dyer</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/mapreduce-book-by-lin-and-dyer.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/mapreduce-book-by-lin-and-dyer.html#comments</comments>
		<pubDate>Mon, 22 Feb 2010 10:58:38 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Course MapReduce</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/mapreduce-book-by-lin-and-dyer.html</guid>
		<description><![CDATA[
Data-Intensive Text Processing with MapReduce


An interesting book of by Jimmy Lin and Chris Dyer is forthcoming, in which 
they show how MapReduce can be used to solve large-scale text processing problems, including
examples that use Expectation Maximization training.



This book is about MapReduce algorithm design, particularly for text processing applications. Although our
presentation most closely follows implementations in [...]]]></description>
			<content:encoded><![CDATA[
<p><strong>Data-Intensive Text Processing with MapReduce</strong></p>

<p>
An interesting book of <em>by Jimmy Lin</em> and <em>Chris Dyer</em> is forthcoming, in which 
they show how MapReduce can be used to solve large-scale text processing problems, including
examples that use Expectation Maximization training.
</p>

<p><em>
This book is about MapReduce algorithm design, particularly for text processing applications. Although our
presentation most closely follows implementations in the Hadoop open-source implementation of MapReduce, this book is explicitly not about Hadoop programming. We don’t for example, discuss APIs, driver programs for composing jobs, command-line invocations for running jobs, etc. 
</em>
</p>

<p>See <a href="http://www.umiacs.umd.edu/~jimmylin/book.html">pre-prints of the book</a>.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/mapreduce-book-by-lin-and-dyer.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Final grades for MapReduce course</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2010/final-grades-for-mapreduce-course.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2010/final-grades-for-mapreduce-course.html#comments</comments>
		<pubDate>Fri, 19 Feb 2010 13:09:19 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Course MapReduce</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2010/final-grades-for-mapreduce-course.html</guid>
		<description><![CDATA[
Final grades for the course are out. You find them on Blackboard&#8217;s personal grade center.

]]></description>
			<content:encoded><![CDATA[
<p>Final grades for the course are out. You find them on <a href="http://blackboard.utwente.nl/bin/common/course.pl?course_id=_1695_1">Blackboard</a>&#8217;s personal grade center.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2010/final-grades-for-mapreduce-course.html/feed/</wfw:commentRSS>
		</item>
	</channel>
</rss>
