Archive for the 'Performance Prediction' Category

Query Performance Prediction

Friday, April 23rd, 2010, posted by Djoerd Hiemstra

Evaluation Contrasted with Effectiveness

by Claudia Hauff, Leif Azzopardi, Djoerd Hiemstra, and Franciska de Jong

Query performance predictors are commonly evaluated by reporting correlation coefficients to denote how well the methods perform at predicting the retrieval performance of a set of queries. Despite the amount of research dedicated to this area, one aspect remains neglected: how strong does the correlation need to be in order to realize an improvement in retrieval effectiveness in an operational setting? We address this issue in the context of two settings: Selective Query Expansion and Meta-Search. In an empirical study, we control the quality of a predictor in order to examine how the strength of the correlation achieved, affects the effectiveness of an adaptive retrieval system. The results of this study show that many existing predictors fail to achieve a correlation strong enough to reliably improve the retrieval effectiveness in the Selective Query Expansion as well as the Meta-Search setting.

[download pdf]

A Case for Automatic System Evaluation

Friday, April 16th, 2010, posted by Djoerd Hiemstra

by Claudia Hauff, Djoerd Hiemstra, Leif Azzopardi, and Franciska de Jong

Ranking a set retrieval systems according to their retrieval effectiveness without relying on relevance judgments was first explored by Soboroff et al. (Soboroff, I., Nicholas, C., Cahan, P. Ranking retrieval systems without relevance judgments. In: SIGIR 2001, pp. 66-73) Over the years, a number of alternative approaches have been proposed, all of which have been evaluated on early TREC test collections. In this work, we perform a wider analysis of system ranking estimation methods on sixteen TREC data sets which cover more tasks and corpora than previously. Our analysis reveals that the performance of system ranking estimation approaches varies across topics. This observation motivates the hypothesis that the performance of such methods can be improved by selecting the “right” subset of topics from a topic set. We show that using topic subsets improves the performance of automatic system ranking methods by 26% on average, with a maximum of 60%. We also observe that the commonly experienced problem of underestimating the performance of the best systems is data set dependent and not inherent to system ranking estimation. These findings support the case for automatic system evaluation and motivate further research.

[download pdf]

Claudia Hauff defends PhD thesis on performance prediction

Monday, February 1st, 2010, posted by Djoerd Hiemstra

Predicting the Effectiveness of Queries and Retrieval Systems

by Claudia Hauff

The thesis considers users’ attempts to express their information needs through queries, or search requests and tries to predict whether those requests will be of high or low quality. Intuitively, a query’s quality is determined by the outcome of the query, that is, whether the retrieved search results meet the user’s expectations. The second type of prediction methods under investigation are those which attempt to predict the quality of search systems themselves. Given a number of search systems to consider, these methods estimate how well or how poorly the systems will perform in comparison to each other.

The motivation for this research effort stems primarily from the enormous benefits originating from successfully predicting the quality of a query or a system. Accurate predictions enable the employment of adaptive retrieval components which would have a considerable positive effect on the user experience. Furthermore, if we would achieve sufficiently accurate predictions of the quality of retrieval systems, the cost of evaluation would be significantly reduced.

In a first step, pre-retrieval predictors are investigated, which predict a query’s effectiveness before the retrieval step and are thus independent of the ranked list of results. Such predictors base their predictions solely on query terms, collection statistics and possibly external sources such as WordNet or Wikipedia. A total of twenty-two prediction algorithms are categorized and their quality is assessed on three different TREC test collections, including two large Web collections. A number of newly applied methods for combining various predictors are examined to obtain a better prediction of a query’s effectiveness. In order to adequately and appropriately compare such techniques the current evaluation methodology is critically examined. It is shown that the standard evaluation measure, namely the linear correlation coefficient, can provide a misleading indication of performance. To address this issue, the current evaluation methodology is extended to include cross validation and statistical testing to determine significant differences.

Building on the analysis of pre-retrieval predictors, post-retrieval approaches are then investigated, which estimate a query’s effectiveness on the basis of the retrieved results. The thesis focuses in particular on the Clarity Score approach and provides an analysis of its sensitivity towards different variables such as the collection, the query set and the retrieval approach. Adaptations to Clarity Score are introduced which improve the estimation accuracy of the original algorithm on most evaluated test collections.

The utility of query effectiveness prediction methods is commonly evaluated by reporting correlation coefficients, such as Kendall’s Tau and the linear correlation coefficient, which denote how well the methods perform at predicting the retrieval effectiveness of a set of queries. Despite the significant amount of research dedicated to this important stage in the retrieval process, the following question has remained unexplored: what is the relationship of the current evaluation methodology for query effectiveness prediction and the change in effectiveness of retrieval systems that employ a predictor? We investigate this question with a large scale study for which predictors of arbitrary accuracy are generated in order to examine how the strength of their observed Kendall’s Tau coefficient affects the retrieval effectiveness in two adaptive system settings: selective query expansion and meta-search. It is shown that the accuracy of currently existing query effectiveness prediction methods is not yet high enough to lead to consistent positive changes in retrieval performance in these particular settings.

The last part of the thesis is concerned with the task of estimating the ranking of retrieval systems according to their retrieval effectiveness without relying on costly relevance judgments. Five different system ranking estimation approaches are evaluated on a wide range of data sets which cover a variety of retrieval tasks and a variety of test collections. The issue that has long prevented this line of automatic evaluation to be used in practice is the severe mis-ranking of the best systems. In the experiments reported in this work, however, we show this not to be an inherent problem of system ranking estimation approaches, it is rather data set dependent. Under certain conditions it is indeed possible to automatically identify the best systems correctly. Furthermore, our analysis reveals that the estimated ranking of systems is not equally accurate for all topics of a topic set, which motivates the investigation of relying on topic subsets to improve the accuracy of the estimate. A study to this effect indicates the validity of the approach.

[download pdf]

Another SIKS/Twente Seminar

Friday, January 8th, 2010, posted by Djoerd Hiemstra

The 3rd SIKS/Twente Seminar on Searching and Ranking takes place on January 29, 2010 at the University of Twente. The goal of the one day workshop is to bring together researchers from companies and academia working on the effectiveness of search engines. The workshop will take place at the University of Twente at the Spiegel (building 2), lecture hall SP-6. Speakers are:

  • Leif Azzopardi (University of Glasgow, UK)
  • Arjen de Vries (CWI and University of Delft)
  • Vanessa Murdock (Yahoo Research, Barcelona, Spain)
After the seminar, Claudia Hauff will defend her PhD Thesis: Predicting the Effectiveness of Queries and Retrieval Systems. The seminar is sponsored by SIKS (the Netherlands research School for Information and Knowledge Systems) and the CTIT (Centre for Telematics and Information Technology). For more information, check out the SSR 2010 website.

Indexing half a billion web pages

Tuesday, October 27th, 2009, posted by Djoerd Hiemstra

by Claudia Hauff and Djoerd Hiemstra

The University of Twente participated in three tasks of TREC 2009: the adhoc task, the diversity task and the relevance feedback task. All experiments are performed on the English part of ClueWeb09. In this draft paper, we describe our approach to tuning our retrieval system in absence of training data in Section 3. We describe the use of categories and a query log for diversifying search results in Section 4. Section 5 describes preliminary results for the relevance feedback task.

[download pdf]

Relying on Topic Subsets for System Ranking Estimation

Tuesday, September 1st, 2009, posted by Djoerd Hiemstra

by Claudia Hauff, Djoerd Hiemstra, Franciska de Jong, and Leif Azzopardi

Ranking a number of retrieval systems according to their retrieval effectiveness without relying on costly relevance judgments was first explored by Soboroff et al. [6]. Over the years, a number of alternative approaches have been proposed. We perform a comprehensive analysis of system ranking estimation approaches on a wide variety of TREC test collections and topics sets. Our analysis reveals that the performance of such approaches is highly dependent upon the topic or topic subset, used for estimation. We hypothesize that the performance of system ranking estimation approaches can be improved by selecting the “right” subset of topics and show that using topic subsets improves the performance by 32% on average, with a maximum improvement of up to 70% in some cases.

The paper will be presented at the 18th ACM Conference on Information and Knowledge Management (CIKM 2009) in Hong Kong, China

[download pdf]

Two Twente-Yahoo papers at SIGIR 2009

Tuesday, April 21st, 2009, posted by Djoerd Hiemstra

Both Pavel Serdyukov and Claudia Hauff have joint papers accepted for the 32nd Annual ACM SIGIR Conference in Boston, USA.

Placing Flickr Photos on a Map
by Pavel Serdyukov, Vanessa Murdock (Yahoo!), and Roelof van Zwol (Yahoo!)

In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language model based entirely on the annotations provided by users. We define extensions to improve over the language model using tag-based smoothing and cell-based smoothing, and leveraging spatial ambiguity. Further we demonstrate how to incorporate GeoNames, a large external database of locations. For varying levels of granularity, we are able to place images on a map with at least twice the precision of the state-of-the-art reported in the literature.

Efficiency trade-offs in two-tier web search systems
by Ricardo Baeza-Yates (Yahoo!), Vanessa Murdock (Yahoo!), and Claudia Hauff

Search engines rely on searching multiple partitioned corpora to return results to users in a reasonable amount of time. In this paper we analyze the standard two-tier architecture for Web search with the difference that the corpus to be searched for a given query is predicted in advance. We show that any predictor better than random yields time savings, but this decrease in the processing time yields an increase in the infrastructure cost. We provide an analysis and investigate this trade-off in the context of two different scenarios on real-world data. We demonstrate that in general the decrease in answer time is justified by a small increase in infrastructure cost.

See: list of accepted papers at SIGIR’09.

The Combination and Evaluation of Query Performance Prediction Methods

Friday, December 12th, 2008, posted by Djoerd Hiemstra

by Claudia Hauff, Leif Azzopardi, and Djoerd Hiemstra

In this paper, we examine a number of newly applied methods for combining pre-retrieval query performance predictors in order to obtain a better prediction of the query’s performance. However, in order to adequately and appropriately compare such techniques, we critically examine the current evaluation methodology and show how using linear correlation coefficients (i) do not provide an intuitive measure indicative of a method’s quality, (ii) can provide a misleading indication of performance, and (iii) overstate the performance of combined methods. To address this, we extend the current evaluation methodology to include cross validation, report a more intuitive and descriptive statistic, and apply statistical testing to determine significant differences. During the course of a comprehensive empirical study over several TREC collections, we evaluate nineteen pre-retrieval predictors and three combination methods.

The paper will be presented at the 31st European Conference on Information Retrieval (ECIR), April 6-9, 2009 in Toulouse, France.

[download pdf]

A Survey of Pre-Retrieval Query Performance Predictors

Thursday, October 30th, 2008, posted by Djoerd Hiemstra

by Claudia Hauff, Djoerd Hiemstra, and Franciska de Jong

The focus of research on query performance prediction is to predict the effectiveness of a query given a search system and a collection of documents. If the performance of queries can be estimated in advance of, or during the retrieval stage, specific measures can be taken to improve the overall performance of the system. In particular, pre-retrieval predictors predict the query performance before the retrieval step and are thus independent of the ranked list of results; such predictors base their predictions solely on query terms, the collection statistics and possibly external sources such as WordNet. In this paper, 22 pre-retrieval predictors are categorized and assessed on three different TREC test collections.

[download pdf]

Improved query difficulty prediction for the web

Friday, August 22nd, 2008, posted by Djoerd Hiemstra

by Claudia Hauff, Vanessa Murdock, Ricardo Baeza-Yates

Query performance prediction aims to predict whether a query will have a high average precision given retrieval from a particular collection, or low average precision. An accurate estimator of the quality of search engine results can allow the search engine to decide to which queries to apply query expansion, for which queries to suggest alternative search terms, to adjust the sponsored results, or to return results from specialized collections. In this paper we present an evaluation of state of the art query prediction algorithms, both post-retrieval and pre-retrieval and we analyze their sensitivity towards the retrieval algorithm. We evaluate query difficulty predictors over three widely different collections and query sets and present an analysis of why prediction algorithms perform significantly worse onWeb data. Finally we introduce Improved Clarity, and demonstrate that it outperforms state-of-the-art predictors on three standard collections, including two large Web collections.

The paper will be presented at the ACM Conference on Information and Knowledge Management CIKM 2008 in Napa Valley, USA

[download pdf]