<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.4" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Djoerd Hiemstra's home page</title>
	<link>http://wwwhome.cs.utwente.nl/~hiemstra</link>
	<description>A bit of teaching, some research, shake well...</description>
	<pubDate>Tue, 15 May 2012 08:09:30 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.4</generator>
	<language>en</language>
			<item>
		<title>Treinplanner nominated best ICT project</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-nominated-best-ict-project.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-nominated-best-ict-project.html#comments</comments>
		<pubDate>Mon, 14 May 2012 07:53:13 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-nominated-best-ict-project.html</guid>
		<description><![CDATA[
Treinplanner.info nominated best ICT project by Computable


Each year, the Dutch journal Computable awards companies, projects and persons in five categories. Treinplanner has been nominated for the best ICT project award 2012 in the category Industry. If you like our project, please vote for Treinplanner at Computable (in Dutch).


See also: Treinplanner on Dutch television.

]]></description>
			<content:encoded><![CDATA[
<p><strong>Treinplanner.info nominated best ICT project by Computable</strong></p>

<p>
Each year, the Dutch journal <strong>Computable</strong> awards companies, projects and persons in five categories. Treinplanner has been <a href="http://www.computable.nl/artikel/computable_awards/4495310/1853296/universiteit-twente-en-ns-project-treinplanner.html">nominated for the best ICT project award 2012</a> in the category <i>Industry</i>. If you like our project, please <a href="http://www2.computable.nl/computableawards/stem/">vote for Treinplanner</a> at Computable (in Dutch).
</p>

<p>See also: <a href="/~hiemstra/2012/treinplanner-on-dutch-television.html">Treinplanner on Dutch television</a>.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-nominated-best-ict-project.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Exploring Language Identification Techniques for Dutch Folktales</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/exploring-language-identification-techniques-for-dutch-folktales.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/exploring-language-identification-techniques-for-dutch-folktales.html#comments</comments>
		<pubDate>Fri, 27 Apr 2012 09:19:08 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Cultural heritage</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/exploring-language-identification-techniques-for-dutch-folktales.html</guid>
		<description><![CDATA[
by Dolf Trieschnigg , Djoerd Hiemstra , Mariët Theune, Franciska de Jong, and Theo Meder


The Dutch Folktale Database contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination
of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number [...]]]></description>
			<content:encoded><![CDATA[
<p>by Dolf Trieschnigg , Djoerd Hiemstra , Mariët Theune, Franciska de Jong, and Theo Meder</p>

<p>
The Dutch <a href="http://www.verhalenbank.nl/">Folktale Database</a> contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination
of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number of
approaches to automatic language identification for this collection. We show that in comparison to typical language identification tasks,
classification performance for highly similar languages with little training data is low. The studied dataset consisting of over 39,000
documents in 16 languages and dialects is available on request for followup research.
</p>

<p><em>The paper will be presented at the LREC Workshop 
 <a href="http://www.c-phil.uni-hamburg.de/view/Main/LrecWorkshop2012">Adaptation of Language Resources and Tools for Processing Cultural Heritage Objects</a> 
on 26 May 2012 in Istanbul, Turkey</em></p>

<p>[<a href="/~hiemstra/papers/lrec-ch12.pdf">download preprint</a>]</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/exploring-language-identification-techniques-for-dutch-folktales.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Saving the Old IR Literature</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/saving-the-old-ir-literature.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/saving-the-old-ir-literature.html#comments</comments>
		<pubDate>Fri, 20 Apr 2012 13:45:47 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/saving-the-old-ir-literature.html</guid>
		<description><![CDATA[

The SIGIR project Saving the Old IR Literature has scanned and
released a new batch of historic IR (Information Retrieval) papers, including early papers on
the SMART system and papers on the development of test collections. The
papers are written by amongst others: Gerard Salton, Karen Sparck Jones,
William Cooper, Keith van Rijsbergen, Stepen Robertson, Martin Kay,
Michael Lesk, and [...]]]></description>
			<content:encoded><![CDATA[
<p>
The SIGIR project <b>Saving the Old IR Literature</b> has scanned and
released a new batch of historic IR (Information Retrieval) papers, including early papers on
the SMART system and papers on the development of test collections. The
papers are written by amongst others: Gerard Salton, Karen Sparck Jones,
William Cooper, Keith van Rijsbergen, Stepen Robertson, Martin Kay,
Michael Lesk, and Nicolas Belkin. The new batch is listed below and
available from the <a href="http://www.sigir.org/museum/newcontents.html">SIGIR web site</a>.
</p>

<p>
The collection contains some unique documents, for instance 
Karen Sparck Jones&#8217; and Keith van Rijsbergen&#8217;s  <a href="http://www.sigir.org/museum/pdfs/pub-14/pub_14.pdf">Report on the Need for and Provision for an &#8216;IDEAL&#8217; Information Retrieval Test Collection</a> written in 1975, which I anxiously searched for when doing my Ph.D. research. The document is an important mile stone towards the current <a href="http://trec.nist.gov">TREC</a> conferences; work that already started in 1960 with <a href="/~hiemstra/2008/how-cyril-cleverdon-set-the-stage-for-ir-research.html">Cyril Cleverdon&#8217;s</a> Cranfield experiments, one of Computer Science&#8217;s earliest examples of empirical testing in a laboratory setting.
</p>

<p>
It&#8217;s all there, enjoy!
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/saving-the-old-ir-literature.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Study tour to South Korea and China</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/study-tour-to-south-korea-and-china.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/study-tour-to-south-korea-and-china.html#comments</comments>
		<pubDate>Wed, 11 Apr 2012 07:13:58 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Photos</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/study-tour-to-south-korea-and-china.html</guid>
		<description><![CDATA[

Noodle is the name of the 2012 study tour organized by study association Inter-Actief from the University of Twente. In September and October 2012 we will visit companies and universities in South Korea and China. Before the students depart they research the countries they will be visiting. All participants conduct research in one of the [...]]]></description>
			<content:encoded><![CDATA[
<p>
<b>Noodle</b> is the name of the 2012 study tour organized by study association Inter-Actief from the University of Twente. In September and October 2012 we will visit companies and universities in South Korea and China. Before the students depart they research the countries they will be visiting. All participants conduct research in one of the six research tracks defined within the tour&#8217;s theme <em>IT Integrated Lifestyle: how IT affects and enriches our daily lives</em>.
</p>

<p >
<table border="0" width="325">
<tr>
<td><a href="/~hiemstra/wp-content/stucie_noodle.jpg"><img width="320" height="192" src="/~hiemstra/wp-content/stucie_noodle.jpg" border="0" alt="Stucie Noodle"/></a><br />
<em>The Study Tour Committee: David Huistra, Lex Utama, Marijn Mensinga, Mark Oude Veldhuis, Nils van Kleef, and Yme Joustra</em></td>

</tr>

</table>

<p>Follow the Noodle study tour preparations at <a href="http://noodle2012.nl">http://noodle2012.nl</a>.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/study-tour-to-south-korea-and-china.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>MIREX in ERCIM News Big Data Special</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/mirex-in-ercim-news-big-data-special.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/mirex-in-ercim-news-big-data-special.html#comments</comments>
		<pubDate>Wed, 11 Apr 2012 06:51:40 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
	<category>MIREX</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/mirex-in-ercim-news-big-data-special.html</guid>
		<description><![CDATA[
by Djoerd Hiemstra and Claudia Hauff



MIREX (MapReduce Information Retrieval Experiments) is a software library initially developed by
the Database Group of the University of Twente for running large scale information retrieval
experiments on clusters of machines. MIREX has been tested on web crawls of up to half a billion
web pages, totalling about 12.5 TB of data uncompressed. [...]]]></description>
			<content:encoded><![CDATA[
<p>by Djoerd Hiemstra and Claudia Hauff</p>

<p>
<img align="left" src="/~hiemstra/wp-content/ercim89_cover.jpg" alt="ERCIM News 89" height="142" width="100" hspace="8" />
MIREX (MapReduce Information Retrieval Experiments) is a software library initially developed by
the Database Group of the University of Twente for running large scale information retrieval
experiments on clusters of machines. MIREX has been tested on web crawls of up to half a billion
web pages, totalling about 12.5 TB of data uncompressed. MIREX shows that the execution of test
queries by a brute force linear scan of pages, is a viable alternative to running the test queries on a
search engine’s inverted index. MIREX is open source and available at <a href="http://mirex.sourceforge.net">SourceForge</a>.
</p>

<p>More information in <a href="http://ercim-news.ercim.eu/en89/special/brute-force-information-retrieval-experiments-using-mapreduce">ERCIM News 89</a>.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/mirex-in-ercim-news-big-data-special.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Emma Search Service</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/emma-search-service.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/emma-search-service.html#comments</comments>
		<pubDate>Tue, 27 Mar 2012 19:22:34 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>IR for children</category>
	<category>Videos</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/emma-search-service.html</guid>
		<description><![CDATA[





This demonstrator showcases the PuppyIR framework by incorporating a numerous child specific components developed as part of the PuppyIR project. The Demonstrator is for Emma’s Children&#8217;s Hospital in Amsterdam and provides children with a novel and exciting interface to help support their information needs while in hospital or visiting the hospital.


EmSe will be demonstrated at [...]]]></description>
			<content:encoded><![CDATA[
<p>
<iframe width="373" height="240" src="http://www.youtube.com/embed/J2GaaMkK9cc" frameborder="0" allowfullscreen></iframe>
</p>

<p>
This demonstrator showcases the PuppyIR framework by incorporating a numerous child specific components developed as part of the PuppyIR project. The Demonstrator is for Emma’s Children&#8217;s Hospital in Amsterdam and provides children with a novel and exciting interface to help support their information needs while in hospital or visiting the hospital.
</p>

<p><em>EmSe will be demonstrated at the 34th European Conference on Information Retrieval (ECIR) in Barcelona on 1-5 April 2012</em>
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/emma-search-service.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Wrapper induction for search results</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/wrapper-induction-for-search-results.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/wrapper-induction-for-search-results.html#comments</comments>
		<pubDate>Sun, 18 Mar 2012 20:47:01 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/wrapper-induction-for-search-results.html</guid>
		<description><![CDATA[

Ranking XPaths for extracting search result records


by Dolf Trieschnigg, Kien Tjin-Kam-Jet and Djoerd Hiemstra


Extracting search result records (SRRs) from webpages is
useful for building an aggregated search engine which combines
search results from a variety of search engines. Most
automatic approaches to search result extraction are not
portable: the complete process has to be rerun on a new
search result [...]]]></description>
			<content:encoded><![CDATA[
<p>
<strong>Ranking XPaths for extracting search result records</strong>
</p>

<p>by Dolf Trieschnigg, Kien Tjin-Kam-Jet and Djoerd Hiemstra</p>

<p>
Extracting search result records (SRRs) from webpages is
useful for building an aggregated search engine which combines
search results from a variety of search engines. Most
automatic approaches to search result extraction are not
portable: the complete process has to be rerun on a new
search result page. In this paper we describe an algorithm to
automatically determine XPath expressions to extract SRRs
from webpages. Based on a single search result page, an
XPath expression is determined which can be reused to 
extract SRRs from pages based on the same template. The
algorithm is evaluated on six datasets, including two new
datasets containing a variety of web, image, video, shopping
and news search results. The evaluation shows that for 85%
of the tested search result pages, a useful XPath is determined. 
The algorithm is implemented as a browser plugin
and as a standalone application which are available as open
source software.
</p>

<p>[<a href="/~hiemstra/papers/tr-ctit-12-08.pdf">download pdf</a>]</p>

<p>Download <a href="http://snipdex.org/srf/">Search Result Finder</a> Firefox plugin.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/wrapper-induction-for-search-results.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>MapReduce grades and evaluation</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/mapreduce-grades-and-evaluation.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/mapreduce-grades-and-evaluation.html#comments</comments>
		<pubDate>Fri, 17 Feb 2012 19:25:28 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Course MapReduce</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/mapreduce-grades-and-evaluation.html</guid>
		<description><![CDATA[

The MapReduce, Pig Latin and Cloud Computing assignments are graded. The final grades can be found in Blackboard&#8217;s grade center. Please join the course evaluation session on 21 February in hal B 2C from 12.30 - 13.30 hour (including a free lunch).


]]></description>
			<content:encoded><![CDATA[
<p>
The MapReduce, Pig Latin and Cloud Computing assignments are graded. The final grades can be found in Blackboard&#8217;s grade center. Please join the course evaluation session on 21 February in hal B 2C from 12.30 - 13.30 hour (including a free lunch).
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/mapreduce-grades-and-evaluation.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Searching the deep web</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/searching-the-deep-web.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/searching-the-deep-web.html#comments</comments>
		<pubDate>Thu, 16 Feb 2012 21:59:07 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
	<category>Videos</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/searching-the-deep-web.html</guid>
		<description><![CDATA[





Today on Radio 1: An interview by Deborah Blekkenhorst on our attempts to search the deep web. And&#8230; no, the deep web is not the part of the web where terrorists hang out. (in Dutch)


]]></description>
			<content:encoded><![CDATA[
<p>
<iframe width="373" height="240" src="http://www.youtube.com/embed/lflmilysokk" frameborder="0" allowfullscreen></iframe>
</p>

<p>
Today on Radio 1: An interview by Deborah Blekkenhorst on our attempts to search the deep web. And&#8230; no, the deep web is <em>not</em> the part of the web where terrorists hang out. (in Dutch)
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/searching-the-deep-web.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Peer-to-Peer Information Retrieval: An Overview</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html#comments</comments>
		<pubDate>Fri, 10 Feb 2012 15:05:18 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html</guid>
		<description><![CDATA[
by Almer Tigelaar, Djoerd Hiemstra, Dolf Trieschnigg


Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype
peer-to-peer information retrieval systems have been developed. Unfortunately, none of these
have seen widespread real-world adoption and thus, in contrast with file sharing, information
retrieval is still dominated by centralised solutions. In this paper we provide [...]]]></description>
			<content:encoded><![CDATA[
<p>by Almer Tigelaar, Djoerd Hiemstra, Dolf Trieschnigg</p>

<p>
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype
peer-to-peer information retrieval systems have been developed. Unfortunately, none of these
have seen widespread real-world adoption and thus, in contrast with file sharing, information
retrieval is still dominated by centralised solutions. In this paper we provide an overview of
the key challenges for peer-to-peer information retrieval and the work done so far. We want
to stimulate and inspire further research to overcome these challenges. This will open the door
to the development and large-scale deployment of real-world peer-to-peer information retrieval
systems that rival existing centralised client-server solutions in terms of scalability, performance,
user satisfaction and freedom.
</p>

<p>
<em>The paper will appear in <a href="http://tois.acm.org/">ACM Transactions on Information Systems</a>.</em>
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html/feed/</wfw:commentRSS>
		</item>
	</channel>
</rss>

