2011 – Page 2 – Djoerd Hiemstra

SIGIR 2011 best papers

This year's SIGIR best paper award was presented to Mikhail Ageev (Moscow State University), and Qi Guo, Dmitry Lagun, and Eugene Agichtein (Emory University) for their paper Find It If You Can: A Game for Modeling Different Types of Web Search Success Using Interaction Data in which they propose a principled formalization of different types of success for informational search, and a scalable game-like infrastructure for crowdsourcing search behavior studies.

The best student paper award was awarded to Shuang-Hong Yang (Georgia Institute of Technology), Bo Long and Alexander J. Smola (Yahoo! Labs), Hongyuan Zha (Georgia Institute of Technology), and Zhaohui Zheng (Yahoo! Labs Beijing) for their paper Collaborative Competitive Filtering: Learning Recommender using Context of User Choice. The paper proposes Collaborative Competitive Filtering (CCF), a framework for learning user preferences by modeling the choice process in recommender systems.

There were honorable mentions for the papers: Parameterized Concept Weighting in Verbose Queries, Understanding Re-finding Behaviour in Naturalistic Email Interaction Log, Out of sight, not out of mind: On the effect of social and physical detachment on information need, Enhanced Results for Web Search, and Recommending Ephemeral Items at Web Scale.

DBDBD 2011 in Twente

The Dutch Belgian Database Day (DBDBD) will be in Twente this year on 2 December 2011. The DBDBD is a yearly one-day workshop organized by a Belgian or Dutch university, whose general topic is database research. DBDBD invites submissions (1 page abstract) on a broad range of database and database-related topics, including but not limited to data storage and management, theoretical database issues, database performance, data integration, data mining, data security, and data search.

At the DBDBD, junior researchers from the Netherlands and Belgium can present their recent results, and meet senior researchers in the field of databases. It is an excellent opportunity to meet up with your Belgian/Dutch colleagues, and to get informed about the (recent) database-related research performed in Belgian/Dutch universities. The workshop is also open to non-Belgian/Dutch participants (presentations are in English). The workshop consists of oral presentations. There are no printed proceedings. Abstracts of talks will be published on the workshop's website.

Keynote speaker at the DBDBD will be prof. Stefano Ceri from Politecnico di Milano, Italy.

See the call for abstracts on the DBDBD 2011 web site.

Folktales As Classifiable Texts

De Nederlandse Volksverhalenbank FACT, Folktales As Classifiable Texts, is a project funded by the NWO Catch program. In the FACT project, the HMI group and DB group of the University of Twente will cooperate with the Meertens Institute to study new possibilities for researchers from humanities disciplines (folktale and narratology researchers, documentalists, etc.) to explore folktales based on annotations and links generated by data-driven methods. To this end, FACT will develop software enabling the computer to automatically enrich a corpus of Dutch folktales with metadata such as names, genre, type, and a summary. In addition, FACT represents the first effort to systematically apply and evaluate various clustering techniques on a very large (40.000+) and diverse collection of folktales. The algorithms developed in the project will be integrated in a user-friendly platform that supports annotation as well as exploratory research into variability in oral and written transmission, using XML database technology to model all folktale data (both annotations and the text of the tale itself) in one unifying framework. A large part of the scientific research in FACT will deal with the pros and cons of human classification and computerized clustering to investigate variation in (oral) transmission. By using document clustering, we hope to discover relationships between documents that cannot be readily identified by human annotators. The main challenge will be to make the computer decide which texts are related and which are not. This is not a black-or-white issue: folktales may be related to each other on different dimensions and to varying degrees. Will the computer be able to recognize the cultural DNA of tales, and make a distinction between different types (no kinship) and versions of the same type (kinship)?

More information at: Nederlandse Volksverhalenbank, and eLab Oral Culture.

PF/Tijah frozen in time

Maintenance of PF/Tijah is discontinued. We can give limited support for people that use the old code-base, but we will no longer fix bugs introduced by for instance running PF/Tijah on new operating systems and new hardware.

More information on PF/Tijah.

ImagePile: an Alternative for Vertical Results Lists

by Saskia Akkersdijk, Merel Brandon, Hanna Jochmann-Mannak, Djoerd Hiemstra, and Theo Huibers

Recent work shows that children are very well capable of searching with Google, due to their familiarity with the interface. However, children do have difficulties with the vertical list representation of the results. In this paper, we present an alternative result representation for a touch interface, the ImagePile. The ImagePile displays the results as a pile of images where the user navigates through via horizontal swiping. This representation was tested on a search engine for the Emma child hospital's library. Using a within subject experiment, both representations were tested with children to compare the usability of both systems. The vertical representation was perceived as easier to use, but the ImagePile system was considered more fun to use. Also, with the ImagePile system more relevant results were chosen by the children, and they were more aware of the number of results.

[download pdf]

Proud of Twente

The university's main entrance honours the FC

PhD position: Deep Web Entity Monitoring

The Database Group of the University of Twente offers a PhD student position in the Dutch national project COMMIT, a 100M Euro project involving 10 universities and 70 companies. The program brings together leading researchers in search engines, parallel computing, databases, interaction in context, embedded systems and knowledge technology.

A large part of the web, the invisible web or deep web, cannot be indexed by web crawlers, for instance dynamic web pages that are returned in response to filling in a web form, or performing a search in a search engine. Instead of crawling deep web data, the approach will monitor web pages for certain (types of) queries. The objective is to develop approaches for monitoring web data that allow users to see a page's full history of relevant/important changes by identifying entities: people, organizations, products, geographic locations, events, etc. The approach should relate changes in multiple web sites, giving the user a data-warehouse-like overview of the pages they monitor; drilling down to time periods, persons, events, etc.

The research will be done in co-operation with WCC. WCC, started in 1996 and is a successful software company based in Utrecht (NL) and Reston (USA). WCC's current focus areas are the Employment and Identification Security markets. Both commercial and government customers worldwide use WCC's smart search & match solutions to support their primary processes. Both WCC and the Database Group of the University of Twente have made significant advances in entity matching and entity ranking applied to for instance Employment Matching and Expert Search. This project will extend this work to monitoring of deep web pages, such a social networking sites, micro-blogging sites, job sites, etc. The candidate will spend part of the time at WCC in Utrecht.

[official vacancy text] (deadline: July 3rd, 2011)

Visual Exploration of Health Information for Children

by Frans van der Sluis, Sergio Duarte, Djoerd Hiemstra, Betsy van Dijk and Frea Kruisinga

Children experience several difficulties retrieving information using current Information Retrieval (IR) systems. Particularly, children struggle to find the right keywords to construct queries given their lack of domain knowledge. This problem is even more critical in the case of the specialized health domain. In this work we present a novel method to address this problem using a cross-media search interface in which the textual data is searched through visual images. This solution aims to solve the recall and recognition problem which is salient for health information, by replacing the need for a vocabulary with the easy task of recognising the different body parts.

[download pdf]

Solutions Home Work Series 2

The solutions to Home Work Series 2 are now on-line in the Assignment section on Blackboard.

Free-Text Search over Complex Web Forms

by Kien Tjin-Kam-Jet, Dolf Trieschnigg, and Djoerd Hiemstra

This paper investigates the problem of using free-text queries as an alternative means for searching 'behind' web forms. We introduce a novel specification language for specifying free-text interfaces, and report the results of a user study where we evaluated our prototype in a travel planner scenario. Our results show that users prefer this free-text interface over the original web form and that they are about 9% faster on average at completing their search tasks.

The paper will be presented at the Information Retrieval Facility Conference IRFC 2011 on 6 June in Vienna, Austria

[download preprint]