2010 – Page 6 – Djoerd Hiemstra

Claudia Hauff defends PhD thesis on performance prediction

Predicting the Effectiveness of Queries and Retrieval Systems

by Claudia Hauff

The thesis considers users' attempts to express their information needs through queries, or search requests and tries to predict whether those requests will be of high or low quality. Intuitively, a query's quality is determined by the outcome of the query, that is, whether the retrieved search results meet the user's expectations. The second type of prediction methods under investigation are those which attempt to predict the quality of search systems themselves. Given a number of search systems to consider, these methods estimate how well or how poorly the systems will perform in comparison to each other.

The motivation for this research effort stems primarily from the enormous benefits originating from successfully predicting the quality of a query or a system. Accurate predictions enable the employment of adaptive retrieval components which would have a considerable positive effect on the user experience. Furthermore, if we would achieve sufficiently accurate predictions of the quality of retrieval systems, the cost of evaluation would be significantly reduced.

In a first step, pre-retrieval predictors are investigated, which predict a query's effectiveness before the retrieval step and are thus independent of the ranked list of results. Such predictors base their predictions solely on query terms, collection statistics and possibly external sources such as WordNet or Wikipedia. A total of twenty-two prediction algorithms are categorized and their quality is assessed on three different TREC test collections, including two large Web collections. A number of newly applied methods for combining various predictors are examined to obtain a better prediction of a query's effectiveness. In order to adequately and appropriately compare such techniques the current evaluation methodology is critically examined. It is shown that the standard evaluation measure, namely the linear correlation coefficient, can provide a misleading indication of performance. To address this issue, the current evaluation methodology is extended to include cross validation and statistical testing to determine significant differences.

Building on the analysis of pre-retrieval predictors, post-retrieval approaches are then investigated, which estimate a query's effectiveness on the basis of the retrieved results. The thesis focuses in particular on the Clarity Score approach and provides an analysis of its sensitivity towards different variables such as the collection, the query set and the retrieval approach. Adaptations to Clarity Score are introduced which improve the estimation accuracy of the original algorithm on most evaluated test collections.

The utility of query effectiveness prediction methods is commonly evaluated by reporting correlation coefficients, such as Kendall's Tau and the linear correlation coefficient, which denote how well the methods perform at predicting the retrieval effectiveness of a set of queries. Despite the significant amount of research dedicated to this important stage in the retrieval process, the following question has remained unexplored: what is the relationship of the current evaluation methodology for query effectiveness prediction and the change in effectiveness of retrieval systems that employ a predictor? We investigate this question with a large scale study for which predictors of arbitrary accuracy are generated in order to examine how the strength of their observed Kendall's Tau coefficient affects the retrieval effectiveness in two adaptive system settings: selective query expansion and meta-search. It is shown that the accuracy of currently existing query effectiveness prediction methods is not yet high enough to lead to consistent positive changes in retrieval performance in these particular settings.

The last part of the thesis is concerned with the task of estimating the ranking of retrieval systems according to their retrieval effectiveness without relying on costly relevance judgments. Five different system ranking estimation approaches are evaluated on a wide range of data sets which cover a variety of retrieval tasks and a variety of test collections. The issue that has long prevented this line of automatic evaluation to be used in practice is the severe mis-ranking of the best systems. In the experiments reported in this work, however, we show this not to be an inherent problem of system ranking estimation approaches, it is rather data set dependent. Under certain conditions it is indeed possible to automatically identify the best systems correctly. Furthermore, our analysis reveals that the estimated ranking of systems is not equally accurate for all topics of a topic set, which motivates the investigation of relying on topic subsets to improve the accuracy of the estimate. A study to this effect indicates the validity of the approach.

[download pdf]

Strong Twente delegation at DIR 2010

Photos taken by Wessel Kraaij and Suzan Verberne during DIR 2010 in Nijmegen are now on-line

Remko Nolten graduates on automatic hyperlinking

WikiLink: Anchor Detection and Link Generation in Wikiâ€™s

by Remko Nolten

In this research we try to automate the process of link generation in Wikiâ€™s by looking at existing link generation techniques and enhancing these with our own ideas. We started the research by analyzing a large document corpus to find out more about the links we want to create. In our analysis we looked into three aspects of our datasets. First, we wanted to know more about the relation between the text that is used to display the link and the title of the page where the link points to. We showed that a large majority of the links could theoretically be identified by matching the text of the link with the page title of the appropriate page, but we also identified several problems with this approach. Second, we wanted to learn more about the existing link structure in our dataset. Here, we confirmed most advantages and disadvantages of using existing links in a link generation algorithm that were also identified by other studies. Finally, we decided to analyze the grammatical structure of links, to see if we could use this later on. Our analysis showed that a very large majority of the links were nouns or noun phrases, which suggests that this would be a good way to identify links in a text.

Based on the results of this analysis, we built a framework in which we could implement new and existing methods for link generation. In the framework, the process of â€˜anchor detectionâ€™ (the technique of discovering phrases in a larger text that could be used a basis for a link) and â€˜destination findingâ€™ (the process of finding a suitable destination page for a short piece of text) where separated. This way we could try multiple combinations to see which would work best. Using this framework, we found that our grammar based anchor detection algorithm combined with multiple destination finding algorithms resulted in the best performance. Our final performance figures were better than most competitors which showed the potential of our techniques.

Searching in the free world

Google faced a cyber attack originating from computers in China, that was serious enough to send an ultimatum to the Chinese government:

…We have decided we are no longer willing to continue censoring our results on Google.cn, and so over the next few weeks we will be discussing with the Chinese government the basis on which we could operate an unfiltered search engine within the law, if at all…

See: Google's blog.

Final MapReduce assignment on Blackboard

The final assignment, Assignment 7 is now on Blackboard. The deadline for this assignment is January 29. This is a hard deadline.

More info on Blackboard.

Another SIKS/Twente Seminar

The 3rd SIKS/Twente Seminar on Searching and Ranking takes place on January 29, 2010 at the University of Twente. The goal of the one day workshop is to bring together researchers from companies and academia working on the effectiveness of search engines. The workshop will take place at the University of Twente at the Spiegel (building 2), lecture hall SP-6. Speakers are:

Leif Azzopardi (University of Glasgow, UK)
Arjen de Vries (CWI and University of Delft)
Vanessa Murdock (Yahoo Research, Barcelona, Spain)

After the seminar, Claudia Hauff will defend her PhD Thesis: Predicting the Effectiveness of Queries and Retrieval Systems. The seminar is sponsored by SIKS (the Netherlands research School for Information and Knowledge Systems) and the CTIT (Centre for Telematics and Information Technology). For more information, check out the SSR 2010 website.

Progress on MapReduce assignments

Happy New Year! Here's an update on the course Distributed Data Processing using MapReduce:

Running and debugging the results of Assignment 4 on the cluster takes some more time than expected. Results of all students were tried at least once on the cluster, or are currently running. For the assignments that need improvements, you can send in new versions.
Assignment 5 (Sawzall) is graded, see the Grade Center on Blackboard.
The dead line Assignment 6 is Friday, January 8. Please let me know before hand if you need more time.

DetectSim software released

DetectSim: contains software for simulating concept detectors for video retrieval. Researchers can use the software to test their concept-based video retrieval approaches without the need to build real detectors.

Concept based video retrieval is a promising search paradigm because it is fully automated and it investigates the fine grained content of a video, which is normally not captured by human annotations. Concepts are captured by so-called concept detectors. However, since these detectors do not yet show a sufficient performance, the evaluation of retrieval systems, which are built on top of the detector output, is difficult. In this report we describe a software package which generates simulated detector output for a specified performance level. Afterwards, this output can be used to execute a search run and ultimately to evaluate the performance of the proposed retrieval method, which is normally done through comparison to a baseline. The probabilistic model of the detectors are two Gaussians, one for the positive and one for the negative class. Thus, the parameters for the simulation are the two means and deviations plus the prior probability of the concept in the dataset.

Download Technical Report.