Participate in the Dutch Common Crawl Challenge

Wednesday, November 14th, 2012, posted by Djoerd Hiemstra

What can you do with 6 billion webpages?

Together with Common Crawl and SARA, we invite students and researchers studying at or employed by research institutes or universities in the Netherlands to dive into the Common Crawl web corpus using the SARA Hadoop service. The best submission will receive the The Norvig Web Data Science Award, a tablet, and 1500 Euro to spend on travel, accommodation, and conference registration fee for SIGIR 2013 to be held in Dublin, Ireland.

The award is named after Peter Norvig, Google’s director of research with a resume too impressive to summarize. Peter is on the advisory board of Common Crawl, and is chair of the jury for this award. Other jury members are Ricardo Baeza-Yates (Yahoo!), Hilary Mason (, Jimmy Lin (University of Maryland), and Evert Lammerts (SARA).

Find out more at the Norvig Award page at Github, the Common Crawl Blog, or come to the Inter-Actief Challenges Information Lunch on 22 November at 12.30h. in Absint.

Emma Search Service

Tuesday, March 27th, 2012, posted by Djoerd Hiemstra

This demonstrator showcases the PuppyIR framework by incorporating numerous child specific components developed as part of the PuppyIR project. The Demonstrator is for Emma’s Children’s Hospital in Amsterdam and provides children with a novel and exciting interface to help support their information needs while in hospital or visiting the hospital.

EmSe will be demonstrated at the 34th European Conference on Information Retrieval (ECIR) in Barcelona on 1-5 April 2012

Searching the deep web

Thursday, February 16th, 2012, posted by Djoerd Hiemstra

Today on Radio 1: An interview by Deborah Blekkenhorst on our attempts to search the deep web. And… no, the deep web is not the part of the web where terrorists hang out. (in Dutch)

Treinplanner on Dutch television

Sunday, January 29th, 2012, posted by Djoerd Hiemstra

Dutch broadcaster BNN tests the intuitive train planner developed at the Database Group. Their verdict: “ingenious”, and “approved for elderly”. Picture of Kien Tjin-Kam-Jet proudly in the back (in Dutch). See the treinplanner in action at:

U. Twente promo

Tuesday, September 1st, 2009, posted by Djoerd Hiemstra

The university finally realized that there are new media. Way to go!

StreetTivo: Multimedia analysis in your living room

Monday, June 18th, 2007, posted by Djoerd Hiemstra

Great performance of Peter Boncz…

Vojkan Mihajlovic defends Ph.D. thesis on structured information retrieval

Friday, December 8th, 2006, posted by Djoerd Hiemstra

Score Region Algebra: A flexible framework for structured information retrieval

by Vojkan Mihajlovic

The scope of the research presented in this thesis is the retrieval of relevant information from structured documents. The thesis describes a framework for information retrieval in documents that have some form of annotation used for describing logical and semantical document structure, such as XML and SGML. The development of the structured information retrieval framework follows the ideas from both database and information retrieval worlds. It uses a three-level database architecture and implements relevance scoring mechanisms inherited from information retrieval models.

To develop the structured retrieval framework, the problem of structured information retrieval is analyzed and elementary requirements for structured retrieval systems are specified. These requirements are: (1) entity selection - the selection of different entities in structured documents, such as elements, terms, attributes, image and video references, which are parts of the user query; (2) entity relevance score computation - the computation of relevance scores for different structured elements with respect to the content they contain; (3) relevance score combination - the combination of relevance scores from (different) elements in a document structure, resulting in a common element relevance score; (4) relevance score propagation - the propagation of scores from different elements to common ancestor or descendant elements following the query. These four requirements are supported when developing a database logical algebra in harmony with the retrieval models used for ranking. In the specification of the logical algebra we face a challenge of a transparent instantiation of retrieval models, i.e., the specification of different retrieval models without affecting the algebra operators.

Download Vojkan’s thesis from EPrints.