10th SIKS/Twente Seminar on Searching and Ranking



The goal of this seminar is to bring together researchers from academia and companies working on the development and evaluation of information systems, in particular retrieval, filtering, and recommending systems. Invited speakers are:

Unfortunately, the invited talk by Jaime Arguello (University of North Carolina in Chapel Hill, USA) has been canceled due to heavy snow and storm in the South East of the USA.

The symposium will take place at the campus of the University of Twente in building Carré, room 1333.
See Travel information. The event is part of the SIKS educational program. PhD-students working in the field of (interactive) information filtering, recommending, and retrieval are encouraged to participate.


13:30 Coffee and Welcome
Making Sense of the Quantified Self

In this talk I will give an introduction to, and an overview of, lifelogging. I will cover lifelogging technology, bothwearable sensors and environmental sensors and show how data from these can be gathered and combined to help create a digital record of our day-to-day activities. I will then describe the ways in which we have been processing lifelog data, especially data from wearable cameras, covering things like event detection, and concept detection and I will illustrate some of the many challenges that are widespread when dealing with lifelogs. Then I will justify the need for lifelogging through several applications and use-cases. Finally, I will sketch out some of the remaining probmens and a roadmap of where I think developments in lifelogging will head.

Alan Smeaton is a researcher and academic at Dublin City University. Among his accomplishments are founding TRECVid, the Centre for Digital Video Processing, and being a winner of the University President's Research Award in Science and Engineering in 2002 and the DCU Educational Trust Leadership Award in 2009. He was a Principal Investigator and Deputy Director of CLARITY: Centre for Sensor Web Technologies, (2008-2013) and is now Director of Insight Centre for Data Analytics at Dublin City University. In 2012 Prof. Smeaton was appointed by Minister Sean Sherlock to the board of the Irish Research Council. He was elected in May 2013 as a Member of the Royal Irish Academy, the highest academic distinction in Ireland.

Alan Smeaton (Dublin City Universiy, Ireland)
14:30 Break
Copulas for Information Retrieval

(A SIGIR 2013 publication by Carsten Eickhoff, Arjen P. de Vries, and Kevyn Collins-Thompson) In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking. Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated learning methods that make it difficult for humans to understand the origin of the final ranking. To address these issues, we introduce the use of copulas, a powerful statistical framework for modeling complex multi-dimensional dependencies, to information retrieval tasks. We provide a formal background to copulas and demonstrate their effectiveness on standard IR tasks such as combining multidimensional relevance estimates and fusion of results from multiple search engines. We introduce copula-based versions of standard relevance estimators and fusion methods and show that these lead to significant performance improvements on several tasks, as evaluated on large-scale standard corpora, compared to their non-copula counterparts. We also investigate criteria for understanding the likely effect of using copula models in a given retrieval scenario. We conclude the talk discussing very recent (and preliminary) results of applying nested copulas to deal with the high dimensional class of problems that arise in the learning-to-rank setup, where every feature would form a relevance dimension by itself.


Arjen de Vries is group leader of the CWI (Centrum vor Wiskunde & Informatica) research group Interactive Information Access (INS2). His research includse structured document retrieval entity ranking, multimedia retrieval, recommender systems, nearest neighbour search, the integration of search technology and database technology. Arjen is also full professor Multimedia Dataspaces at Delft University, and he is co-founder of the the CWI spin-off Spinque.
Arjen de Vries (CWI Amsterdam)
15:30 Closing
Information Retrieval for Children: Search Behavior and Solutions

Nowadays, children of very young ages and teenagers use the Internet extensively for entertainment and educational purposes. The number of active young users in the Internet is increasing everyday as the Internet is more accessible at home, schools and even on a mobile basis through cellphones and tablets.
The most popular search engines are designed for adults and they do not provide customize tools for young users. Given that young and adult users have different interests and search strategies, research aimed at understanding the activities that young users carried out in the Internet, the way the search for information, and the divculties that they encounter with state-of-the-art search engines, are urgently needed. The first contribution of this thesis addresses these research aims by providing a characterization, on a large scale, of the search behavior of young users. The problems they face when they search for information in the web, the topics they searched and the online activities that motive search were explored in detail and contrasted against the search behavior of adult users. The results presented in this thesis have important implications for the development of search tools for young users and for the design of educational literacy. Two central problems were identified in the search process of young users: (1) diffiulty representing the information needs with keyword queries, and (2) difficulty exploring the list of results.
We found that focused queries are often required to access high quality content for young user with modern search engines. However, young users were found to submit queries that lack the specificity needed to retrieve content that is suitable for them, which leads to frustration during the search process. This observation motivates the second contribution of this thesis. We propose novel query recommendation methods to improve the chances of young users to find content that is suitable and on topic for them. Concretely, we presente an ective biased random walk based on information gain metrics. This method is combined with topical and specialized features designed for the information domain of young users. We show that our query suggestions outperform by a larger margin not only related query recommendation methods but also the query suggestions offered by the search services available today.
In respect to the second divculty, it was found that young users have a strong click bias, in which results ranked at the bottom of the result list are rarely clicked. This behavior greatly hampers their navigational skills and exploration of results. It also reduces the chances of young users to find suitable information, since appropriate content for this audience is ranked, on average, at lower positions in the result list in comparison to the content aimed at the average web user.
The third contribution of this thesis aims at helping young users to improve their chances to find appropriate content and to ease the exploration of results. For this purpose, we envisage an aggregated search system for young users, in which parents, teachers and young users add search services (i.e. verticals) with content of interests for young audiences. We propose a test collection with a wide number of verticals with moderated content, a carefully selected set of search queries and vertical relevant judgments. We also provide novel methods of vertical selection in this information domain based on social media and based on the estimation of the amount of content that is appropriate for young users in each vertical. We show that our methods outperform state-of-the-art vertical selection methods in this information domain.
We also show in a case study with children aged 9 to 10 years old that result pages derived from the collection proposed are preferred over the result pages provided by modern search engines. We provide evidence showing that the interaction and exploration of results are improved with aggregated pages built using this collection, even if the users of the study were unaware between the diㄦences between the types of pages displayed to them.
This thesis is concluded by providing concrete follow-up research directions and by suggesting other information domains that can potentially benefit from the methods proposed in the thesis.

PhD Defense by Sergio Duarte Torres (University of Twente)


CTIT Centre for Telematics and Information Technology
SIKS Netherlands research school for Information and Knowledge Systems


Please send your name and affiliation to if you plan to attend the symposium, and help us estimate the required catering.