Emma Search Service

This demonstrator showcases the PuppyIR framework by incorporating numerous child specific components developed as part of the PuppyIR project. The Demonstrator is for Emma’s Children’s Hospital in Amsterdam and provides children with a novel and exciting interface to help support their information needs while in hospital or visiting the hospital.

EmSe will be demonstrated at the 34th European Conference on Information Retrieval (ECIR) in Barcelona on 1-5 April 2012

Wrapper induction for search results

Ranking XPaths for extracting search result records

by Dolf Trieschnigg, Kien Tjin-Kam-Jet and Djoerd Hiemstra

Extracting search result records (SRRs) from webpages is useful for building an aggregated search engine which combines search results from a variety of search engines. Most automatic approaches to search result extraction are not portable: the complete process has to be rerun on a new search result page. In this paper we describe an algorithm to automatically determine XPath expressions to extract SRRs from webpages. Based on a single search result page, an XPath expression is determined which can be reused to extract SRRs from pages based on the same template. The algorithm is evaluated on six datasets, including two new datasets containing a variety of web, image, video, shopping and news search results. The evaluation shows that for 85% of the tested search result pages, a useful XPath is determined. The algorithm is implemented as a browser plugin and as a standalone application which are available as open source software.

[download pdf]

Search Result Finder XPaths

Download Search Result Finder Firefox plugin.

Peer-to-Peer Information Retrieval: An Overview

by Almer Tigelaar, Djoerd Hiemstra, Dolf Trieschnigg

Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real-world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom.

The paper will appear in ACM Transactions on Information Systems.

[download pdf]

Treinplanner on Dutch television

Dutch broadcaster BNN tests the intuitive train planner developed at the Database Group. Their verdict: “ingenious”, and “approved for elderly”. Picture of Kien Tjin-Kam-Jet proudly in the back (in Dutch). See the treinplanner in action at: http://treinplanner.info

CLEF 2012 in Rome

CLEF 2012: Conference and Labs of the Evaluation Forum: First Call for Participation

The CLEF 2012 is next year's edition of the popular CLEF campaign and workshop series which has run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. In 2010 CLEF was launched in a new format, as a conference with research presentations, panels, poster and demo sessions and laboratory evaluation workshops. Labs follow under two types: laboratories to conduct evaluation of information access systems, and workshops to discuss and pilot innovative evaluation activities. In 2012, CLEF will take place in September 17-20 in Rome, and researchers and practitioners from all segments of the information access and related communities are invited to participate to the following Evaluation Labs:

  • CHiC – Cultural Heritage in CLEF
  • CLEF-IP – Informaton Retrieval in the Intellectual Property domain
  • ImageCLEF – Cross Language Image Retrieval
  • INEX – INitiative for the Evaluation of XML Retrieval
  • PAN – Uncovering Plagiarism, Authorship, and Social Software Misuse
  • QA4MRE – Question Answering for Machine Reading Evaluation
  • RepLab 2012 – Online Reputation Management
  • CLEFeHealth – Electronic Health

More information at: http://clef2012.org/

English Wikipedia off-line as protest

Yesterday, the Wikipedia community announced its decision to black out the English-language Wikipedia for 24 hours, worldwide, on Wednesday, January 18. The blackout is a protest against proposed legislation in the United States – the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA). See:
http://wikimediafoundation.org/wiki/English_Wikipedia_anti-SOPA_blackout

If I understand things right, the Stop Online Piracy Act will allow a U.S. court to legally demand to take utwente.nl off-line, just because a student or professor published a link to circumvent internet censorship on his/her University of Twente web page, for instance a link like:
https://addons.mozilla.org/en-US/firefox/addon/desopa/