Archive for 2012

Emma Search Service

Tuesday, March 27th, 2012, posted by Djoerd Hiemstra

This demonstrator showcases the PuppyIR framework by incorporating numerous child specific components developed as part of the PuppyIR project. The Demonstrator is for Emma’s Children’s Hospital in Amsterdam and provides children with a novel and exciting interface to help support their information needs while in hospital or visiting the hospital.

EmSe will be demonstrated at the 34th European Conference on Information Retrieval (ECIR) in Barcelona on 1-5 April 2012

Wrapper induction for search results

Sunday, March 18th, 2012, posted by Djoerd Hiemstra

Ranking XPaths for extracting search result records

by Dolf Trieschnigg, Kien Tjin-Kam-Jet and Djoerd Hiemstra

Extracting search result records (SRRs) from webpages is useful for building an aggregated search engine which combines search results from a variety of search engines. Most automatic approaches to search result extraction are not portable: the complete process has to be rerun on a new search result page. In this paper we describe an algorithm to automatically determine XPath expressions to extract SRRs from webpages. Based on a single search result page, an XPath expression is determined which can be reused to extract SRRs from pages based on the same template. The algorithm is evaluated on six datasets, including two new datasets containing a variety of web, image, video, shopping and news search results. The evaluation shows that for 85% of the tested search result pages, a useful XPath is determined. The algorithm is implemented as a browser plugin and as a standalone application which are available as open source software.

[download pdf]

Search Result Finder XPaths

Download Search Result Finder Firefox plugin.

MapReduce grades and evaluation

Friday, February 17th, 2012, posted by Djoerd Hiemstra

The MapReduce, Pig Latin and Cloud Computing assignments are graded. The final grades can be found in Blackboard’s grade center. Please join the course evaluation session on 21 February in hal B 2C from 12.30 - 13.30 hour (including a free lunch).

Searching the deep web

Thursday, February 16th, 2012, posted by Djoerd Hiemstra

Today on Radio 1: An interview by Deborah Blekkenhorst on our attempts to search the deep web. And… no, the deep web is not the part of the web where terrorists hang out. (in Dutch)

Peer-to-Peer Information Retrieval: An Overview

Friday, February 10th, 2012, posted by Djoerd Hiemstra

by Almer Tigelaar, Djoerd Hiemstra, Dolf Trieschnigg

Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real-world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom.

The paper will appear in ACM Transactions on Information Systems.

[download pdf]

Treinplanner on Dutch television

Sunday, January 29th, 2012, posted by Djoerd Hiemstra

Dutch broadcaster BNN tests the intuitive train planner developed at the Database Group. Their verdict: “ingenious”, and “approved for elderly”. Picture of Kien Tjin-Kam-Jet proudly in the back (in Dutch). See the treinplanner in action at:

CLEF 2012 in Rome

Monday, January 23rd, 2012, posted by Djoerd Hiemstra

CLEF 2012: Conference and Labs of the Evaluation Forum: First Call for Participation

The CLEF 2012 is next year’s edition of the popular CLEF campaign and workshop series which has run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. In 2010 CLEF was launched in a new format, as a conference with research presentations, panels, poster and demo sessions and laboratory evaluation workshops. Labs follow under two types: laboratories to conduct evaluation of information access systems, and workshops to discuss and pilot innovative evaluation activities. In 2012, CLEF will take place in September 17-20 in Rome, and researchers and practitioners from all segments of the information access and related communities are invited to participate to the following Evaluation Labs:

  • CHiC - Cultural Heritage in CLEF
  • CLEF-IP - Informaton Retrieval in the Intellectual Property domain
  • ImageCLEF - Cross Language Image Retrieval
  • INEX - INitiative for the Evaluation of XML Retrieval
  • PAN - Uncovering Plagiarism, Authorship, and Social Software Misuse
  • QA4MRE - Question Answering for Machine Reading Evaluation
  • RepLab 2012 - Online Reputation Management
  • CLEFeHealth - Electronic Health

More information at:


Monday, January 23rd, 2012, posted by Djoerd Hiemstra

We released a demo today: The Treinplanner built by Kien that allows you to search the search the Dutch Railways Journey planner with a single search box. (in Dutch)

Iwe Muiser joins Database Group

Tuesday, January 17th, 2012, posted by Djoerd Hiemstra

Iwe Muiser joined the Database Group to work on the project FACT: “Folktales as Classifiable Texts”. Welcome Iwe!

English Wikipedia off-line as protest

Tuesday, January 17th, 2012, posted by Djoerd Hiemstra

Yesterday, the Wikipedia community announced its decision to black out the English-language Wikipedia for 24 hours, worldwide, on Wednesday, January 18. The blackout is a protest against proposed legislation in the United States - the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA). See:

If I understand things right, the Stop Online Piracy Act will allow a U.S. court to legally demand to take off-line, just because a student or professor published a link to circumvent internet censorship on his/her University of Twente web page, for instance a link like: