Archive for the 'Challenges' Category

Twente at NTCIR 2013

Thursday, May 2nd, 2013, posted by Djoerd Hiemstra

An API-based Search System for One Click Access to Information

by Dan Ionita, Niek Tax, and Djoerd Hiemstra

This paper proposes a prototype One Click access system, based on previous work in the field and the related 1CLICK-2@NTCIR10 task. The proposed solution integrates methods from previous such attempts into a three tier algorithm: query categorization, information extraction and output generation and offers suggestions on how each of these can be implemented. Finally, a thorough user-based evaluation concludes that such an information retrieval system outperforms the textual preview collected from Google search results, based on a paired sign test. Based on validation results possible suggestions on future improvements are proposed.

To be presented at the Japanese National Institute of Informatics (NII) Testbeds and Community for Information access Research (NTCIR-10) Conference at the National Center of Sciences, Tokyo, Japan on June 18-21

[download pdf]

Traitor: Associating Concepts using the WWW

Wednesday, April 17th, 2013, posted by Djoerd Hiemstra

by Wanno Drijfhout, Oliver Jundt, and Lesley Wevers

Traitor uses Common Crawl’s 25TB data set of web pages to construct a database of associated concepts using Hadoop. The database can be queried through a web application with two query interfaces. A textual interface allows searching for similarities and differences between multiple concepts using a query language similar to set notation, and a graphical interface allows users to visualize similarity relationships of concepts in a force directed graph.

To be presented at the 13th Dutch-Belgian Information Retrieval Workshop DIR 2013 on 26 April in Delft, The Netherlands

[download pdf]

Try Traitor at

Readability of the Web

Monday, April 15th, 2013, posted by Djoerd Hiemstra

A study on 1 billion web pages.

by Marije de Heus

Automated Readability Index for the Web

We have performed a readability study on more than 1 billion web pages. The Automated Readability Index was used to determine the average grade level required to easily comprehend a website. Some of the results are that a 16-year-old can easily understand 50% of the web and an 18-year old can easily understand 77% of the web. This information can be used in a search engine to filter websites that are likely to be incomprehensible for younger users.

To be presented at the 13th Dutch-Belgian Information Retrieval Workshop DIR 2013 on 26 April in Delft, The Netherlands

[download pdf]

Google Online Marketing Challenge

Wednesday, January 23rd, 2013, posted by Djoerd Hiemstra

Google Online Marketing Challenge Interested in online advertising and marketing? Together with Inter-Actief we will run a second science challenge in the next quarter from 11 Februari to 4 April. With a US$250 budget provided by Google, students will develop an online advertising strategy for a real business or non-profit organization that has not used Google’s AdWords in the last six months. The winners will receive a trip to the Google Headquarters in Mountain View, California to meet with the AdWords team. For more information, and to enroll, visit

Also, see the Google Online Marketing Challenge page.

Participate in the Dutch Common Crawl Challenge

Wednesday, November 14th, 2012, posted by Djoerd Hiemstra

What can you do with 6 billion webpages?

Together with Common Crawl and SARA, we invite students and researchers studying at or employed by research institutes or universities in the Netherlands to dive into the Common Crawl web corpus using the SARA Hadoop service. The best submission will receive the The Norvig Web Data Science Award, a tablet, and 1500 Euro to spend on travel, accommodation, and conference registration fee for SIGIR 2013 to be held in Dublin, Ireland.

The award is named after Peter Norvig, Google’s director of research with a resume too impressive to summarize. Peter is on the advisory board of Common Crawl, and is chair of the jury for this award. Other jury members are Ricardo Baeza-Yates (Yahoo!), Hilary Mason (, Jimmy Lin (University of Maryland), and Evert Lammerts (SARA).

Find out more at the Norvig Award page at Github, the Common Crawl Blog, or come to the Inter-Actief Challenges Information Lunch on 22 November at 12.30h. in Absint.

Join TREC FedWeb’13

Tuesday, November 13th, 2012, posted by Djoerd Hiemstra

FedWeb ‘13 is the new TREC (Text Retrieval Conference) Federated Web Search task, that will provide a test collection that organizes and stimulates research in many areas related to federated search, including aggregated search, distributed search, peer-to-peer search and meta-search engines. The track will evaluate federated and aggregated search in a large heterogeneous setting using the search results of existing search engines.

Join the mailing and keep up-to-date with FedWeb’13.

OLC-IT Jaarverslag 2011-2012

Tuesday, September 25th, 2012, posted by Djoerd Hiemstra

De opleidingscommissie IT (OLC-IT) houdt zich bezig met examenregelingen en het onderwijs­programma van de bacheloropleidingen Technische Informatica en Telematica en de master­opleidingen Computer Science, en Telematics. Ze heeft wettelijk het recht om gevraagd en ongevraagd advies uit te brengen aan de opleidingsdirecteur en de decaan. Elk jaar maakt de OLC een jaarverslag. Dit jaar in het jaarverslag:

  • Curriculumwijzigingen
  • Kwaliteitszorg
  • Universiteitsbrede OER
  • Studieversnellende maatregelen
  • Twents onderwijsmodel
  • Interactie student en docent

Lees het hele jaarverslag 2011-2012.