Archive for the 'Expert Search' Category

Pavel Serdyukov defends PhD thesis on Expert Search

Thursday, June 25th, 2009, posted by Djoerd Hiemstra

The automatic search for knowledgeable people in the scope of an organization is a key function which makes modern enterprise search systems commercially successful and socially demanded. A number of effective approaches to expert finding were recently proposed in academic publications. Although, most of them use reasonably defined measures of personal expertise, they often limit themselves to rather unrealistic and sometimes oversimplified principles. In this thesis, we explore several ways to go beyond state-of-the-art assumptions used in research on expert finding and propose several novel solutions for this and related tasks. First, we describe measures of expertise that do not assume independent occurrence of terms and persons in a document what makes them perform better than the measures based on independence of all entities in a document. One of these measures makes persons central to the process of terms generation in a document. Another one assumes that the position of the person’s mention in a document with respect to the positions of query terms indicates the relation of the person to the document’s relevant content. Second, we find the ways to use not only direct expertise evidence for a person concentrated within the document space of the person’s current employer and only within those organizational documents that mention the person. We successfully utilize the predicting potential of additional indirect expertise evidence publicly available on the Web and in the organizational documents implicitly related to a person. Finally, besides the expert finding methods we proposed, we also demonstrate solutions for tasks from related domains. In one case, we use several algorithms of multi-step relevance propagation to search for typed entities in Wikipedia. In another case, we suggest generic methods for placing photos uploaded to Flickr on the world map using language models of locations built entirely on the annotations provided by users with a few task specific extensions.

[download pdf]

2nd SIKS/Twente Seminar on Searching and Ranking

Monday, June 8th, 2009, posted by Djoerd Hiemstra

On June 24, 2009 at the University of Twente

http://www.cs.utwente.nl/~hiemstra/ssr2009/

The goal of the one day seminar is to bring together researchers from companies and academia working on enterprise search problems. Speakers at the seminar are: David Hawking from Funnelback Internet and Enterprise Search & the Australian National University, who will talk about Practical Methods for Evaluating Enterprise Search. Iadh Ounis from the University of Glasgow will present Voting Techniques for Expert Search. Maarten de Rijke from the University of Amsterdam will talk about Expert Profiling Out In the Wild.

University of Twente at the TREC 2008 Enterprise Track

Friday, October 24th, 2008, posted by Djoerd Hiemstra

Using the Global Web as an expertise evidence source

by Pavel Serdyukov, Robin Aly, Djoerd Hiemstra

This is the fourth (and the last) year of the TREC Enterprise Track and the second year the University of Twente submitted runs for the expert finding task. In the methods that were used to produce these runs, we mostly rely on the predicting potential of those expertise evidence sources that are publicly available on the Global Web, but not hosted at the website of the organization under study (CSIRO). This paper describes the follow-up studies complimentary to our recent research that demonstrated how taking the web factor seriously significantly improves the performance of expert finding in the enterprise.

The paper will be presented at the 17th Text Retrieval Conference (TREC), November 19-21, at the United States National Institute of Standards and Technology in Gaithersburg, USA.

[download draft paper] [More info]

Multi-step Relevance Propagation for Expert Finding

Friday, August 15th, 2008, posted by Djoerd Hiemstra

by Pavel Serdyukov, Henning Rode, and Djoerd Hiemstra

A fragment of the real expertise graph with links between documents white nodes) and candidate experts (black nodes) for query 'sustainable ecosystems' An expert finding system allows a user to type a simple text query and retrieve names and contact information of individuals that possess the expertise expressed in the query. This paper proposes a novel approach to expert finding in large enterprises or intranets by modeling candidate experts (persons), web documents and various relations among them with so-called expertise graphs. As distinct from the state-of-the-art approaches estimating personal expertise through one-step propagation of relevance probability from documents to the related candidates, our methods are based on the principle of multi-step relevance propagation in topic-specific expertise graphs. We model the process of expert finding by probabilistic random walks of three kinds: finite, infinite and absorbing. Experiments on TREC Enterprise Track data originating from two large organizations show that our methods using multi-step relevance propagation improve over the baseline one-step propagation based method in almost all cases.

The paper will be presented at the ACM Conference on Information and Knowledge Management CIKM 2008 in Napa Valley, USA

[download pdf]

Being Omnipresent to be Almighty

Friday, June 20th, 2008, posted by Djoerd Hiemstra

The Importance of the Global Web Evidence for Organizational Expert Finding

by Pavel Serdyukov and Djoerd Hiemstra

Modern expert finding algorithms are developed under the assumption that all possible expertise evidence for a person is concentrated in a company that currently employs the person. The evidence that can be acquired outside of an enterprise is traditionally unnoticed. At the same time, the Web is full of personal information which is sufficiently detailed to judge about a person’s skills and knowledge. In this work, we review various sources of expertise evidence outside of an organization and experiment with rankings built on the data acquired from six different sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only.

The paper will be presented at the Future Challenges in Expertise Retrieval fCHER workshop in Singapore

[download pdf]

Pavel Serdyukov wins ECIR best student paper award

Tuesday, April 1st, 2008, posted by Djoerd Hiemstra

Pavel shows his check

Great news: Yesterday, Pavel Serdyukov won the best student paper award at the European Conference on Information Retrieval (ECIR) in Glasgow for his paper Modeling documents as mixtures of persons for expert finding. The award includes a check of $ 1200 sponsored by Yahoo.

[download pdf]

ECIR tutorial slides on-line

Monday, March 31st, 2008, posted by Djoerd Hiemstra

Djoerd performing at ECIR

I enjoyed giving the advanced language modeling tutorial at the European Conference on Information Retrieval (ECIR). The slides are now availble for download below.

[download pdf]

Modeling documents as mixtures of persons

Friday, March 28th, 2008, posted by Djoerd Hiemstra

by Pavel Serdyukov and Djoerd Hiemstra

In this paper we address the problem of searching for knowledgeable persons within the enterprise, known as the expert finding (or expert search) task. We present a probabilistic algorithm using the assumption that terms in documents are produced by people who are mentioned in them. We represent documents retrieved to a query as mixtures of candidate experts language models. Two methods of personal language models extraction are proposed, as well as the way of combining them with other evidences of expertise. Experiments conducted with the TREC Enterprise collection demonstrate the superiority of our approach in comparison with the best one among existing solutions.

download pdf

Relevance propagation for expert search

Monday, February 11th, 2008, posted by Djoerd Hiemstra

by Pavel Serdyukov, Henning Rode, and Djoerd Hiemstra

This paper describes several approaches which we used for the expert search task of the TREC 2007 Enterprise track. We studied several methods of relevance propagation from documents to related candidate experts. Instead of one- step propagation from documents to directly related candidates, used by many systems in the previous years, we do not limit the relevance flow and disseminate it further through mutual documents-candidates connections. We model relevance propagation using random walk principles, or in formal terms, discrete Markov processes. We experiment with infinite and finite numbers of propagation steps. We also demonstrate how additional information, namely hyperlinks among documents, organizational structure of the enterprise and relevance feedback may be utilized by the presented techniques.

[download pdf]

Tutorial: Advanced language modeling approaches

Thursday, January 10th, 2008, posted by Djoerd Hiemstra

(Case study: Expert search)

I will give a tutorial at the 30th European Conference on Information Retrieval (ECIR): The tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions.

[download pdf]

See the ECIR tutorials and workshops page