SIKS/Twente Seminar on Searching Speech

The 6th SSR on Searching Speech: Evaluation of Speech Recognition in Context will take place on 5 July 2012 at the University of Twente. Invited speakers are:

  • Gareth Jones (Dublin City University, Ireland)
  • David van Leeuwen (University Nijmegen and Netherlands Forensic Institute)
  • Lori Lamel (Limsi – CNRS, France)

SSR-6 is organized by Franciska de Jong, Laurens van der Werff en Thijs Verschoor, and will take place at the campus of the University of Twente at the Citadel (building 9), lecture hall H327. The event is sponsored by the Netherlands research School for Information and Knowledge Systems (SIKS), the Netherlands Organisation for Scientific Research (NWO), and the Centre for Telematics and Information Technology (CTIT). Please visit the SSR-6 home page for more information.

Query log analysis for Treinplanner

An analysis of free-text queries for a multi-field web form

by Kien Tjin-Kam-Jet, Dolf Trieschnigg, and Djoerd Hiemstra

Treinplanner.info We report how users interact with an experimental system that transforms single- field textual input into a multi- field query for an existing travel planner system. The experimental system was made publicly available and we collected over 30,000 queries from almost 12,000 users. From the free-text query log, we examined how users formulated structured information needs into free-text queries. The query log analysis shows that there is great variety in query formulation, over 400 query templates were found that occurred at least 4 times. Furthermore, with over 100 respondents to our questionnaire, we provide both quantitative and qualitative evidence indicating that end-users significantly prefer a single field interface over a multi-field interface when performing structured search.

The paper will be presented at the fourth Information Interaction in Context Symposium, IIiX 2012 on August 21-24, 2012 in Nijmegen, the Netherlands.

[download pdf]

Initial Evaluation of EmSe

EmSe: Initial Evaluation of a Child-friendly Medical Search System

by PuppyIR

When undergoing medical treatment in combination with extended stays in hospitals, children have been frequently found to develop an interest in their condition and the course of treatment. PuppyIR A natural means of searching for related information would be to use a web search engine. The medical domain, however, imposes several key challenges on young and inexperienced searchers, such as difficult terminology, potentially frightening topics or non-objective information offered by lobbyists or pharmaceutical companies. To address these problems, we present the design and usability study of EmSe, a search service for children in a hospital environment.

The paper will be presented at the fourth Information Interaction in Context Symposium, IIiX 2012 on August 21-24, 2012 in Nijmegen, the Netherlands.

[download pdf]

Bessensap 2012 en het diepe web

Djoerd bij Bessensap in het Museon Meer dan 99 procent van het wereldwijde web is op dit moment niet doorzoekbaar door zoekmachines. Daardoor blijft veel informatie ontoegankelijk. Relatief eenvoudige vragen als 'Wat is de beste treinreis van Enschede naar Amsterdam op 4 juni 2012?' en 'Wat is het telefoonnummer van Djoerd Hiemstra uit Enschede?' kunnen niet door zoekmachines als Google en Bing worden beantwoord kunnen worden. Toch is het antwoord daarvan wel degelijk beschikbaar op het web. Namelijk in het diepe web, waar zoekmachines niet kunnen komen omdat ze de pagina's niet van te voren hebben gedownload. De redenen daarvoor zijn divers en de Universiteit Twente onderzoekt methoden waarmee deze informatie toch gevonden kan worden door vragen op juiste te interpreteren, vragen naar de juiste bron te sturen en zoekresultaten te interpreteren en te integreren met resultaten van andere bronnen. De eerste demonstratie van onderzoeksresultaten uit dit onderzoek (http://treinplanner.info) kreeg sinds begin 2012 al 10.000den bezoekers.

Foto: Jan Taco te Gussinklo. Een leuk verslag is te vinden op: Dutch Button Works.

Exploring Language Identification Techniques for Dutch Folktales

by Dolf Trieschnigg , Djoerd Hiemstra , Mariët Theune, Franciska de Jong, and Theo Meder

The Dutch Folktale Database contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number of approaches to automatic language identification for this collection. We show that in comparison to typical language identification tasks, classification performance for highly similar languages with little training data is low. The studied dataset consisting of over 39,000 documents in 16 languages and dialects is available on request for followup research.

The paper will be presented at the LREC Workshop Adaptation of Language Resources and Tools for Processing Cultural Heritage Objects on 26 May 2012 in Istanbul, Turkey

[download preprint]

Saving the Old IR Literature

The SIGIR project Saving the Old IR Literature has scanned and released a new batch of historic IR (Information Retrieval) papers, including early papers on the SMART system and papers on the development of test collections. The papers are written by amongst others: Gerard Salton, Karen Sparck Jones, William Cooper, Keith van Rijsbergen, Stepen Robertson, Martin Kay, Michael Lesk, and Nicolas Belkin. The new batch is listed below and available from the SIGIR web site.

The collection contains some unique documents, for instance Karen Sparck Jones' and Keith van Rijsbergen's Report on the Need for and Provision for an 'IDEAL' Information Retrieval Test Collection written in 1975, which I anxiously searched for when doing my Ph.D. research. The document is an important mile stone towards the current TREC conferences; work that already started in 1960 with Cyril Cleverdon's Cranfield experiments, one of Computer Science's earliest examples of empirical testing in a laboratory setting.

It's all there, enjoy!

Study tour to South Korea and China

Noodle is the name of the 2012 study tour organized by study association Inter-Actief from the University of Twente. In September and October 2012 we will visit companies and universities in South Korea and China. Before the students depart they research the countries they will be visiting. All participants conduct research in one of the six research tracks defined within the tour's theme IT Integrated Lifestyle: how IT affects and enriches our daily lives.

Stucie Noodle
The Study Tour Committee: David Huistra, Lex Utama, Marijn Mensinga, Mark Oude Veldhuis, Nils van Kleef, and Yme Joustra

Follow the Noodle study tour preparations at http://noodle2012.nl.

MIREX in ERCIM News Big Data Special

by Djoerd Hiemstra and Claudia Hauff

ERCIM News 89 MIREX (MapReduce Information Retrieval Experiments) is a software library initially developed by the Database Group of the University of Twente for running large scale information retrieval experiments on clusters of machines. MIREX has been tested on web crawls of up to half a billion web pages, totalling about 12.5 TB of data uncompressed. MIREX shows that the execution of test queries by a brute force linear scan of pages, is a viable alternative to running the test queries on a search engine’s inverted index. MIREX is open source and available at SourceForge.

More information in ERCIM News 89.