Efficient graph-based context-sensitive search

by Debora Donato (Yahoo, Research Barcelona, Spain)
joint work with: Carlos Castillo, Aristides Gionis, Antti Ukkonen

We study the problem of context-sensitive search, in which a query consists of a set of terms q and a document p already in the collection, and the task is to rank the documents in decreasing order of relevance with respect to the query hq, pi. The document p provides the context in which the query terms q should be understood. By attaching a context to the query terms, the search results of a search initiated in a particular page can be made more relevant. If we consider the collection of documents as a directed graph G with vertices the documents and edges the links among the documents, an intuitive approach for the problem is to employ the personalized PageRank computation by defining the teleport vector to be the vector that jumps back to the query document p. However, computing and storing personalized PageRank vectors for each documents separately is infeasible. Instead we compute a partition of G by using a spectral method and we store a personalization vector only for each cluster. In addition to these approximate personalization vectors we use also traditional information retrieval measures such as BM25 and other graph-based similarity measures to form feature vectors. RankSVM is then used to learn weights for individual features given suitably constructed training data. Documents are ranked at query time using the inner product of the feature and the weight vectors.
The experiments made using Wikipedia indicate that the proposed method considerably improves results obtained by a more traditional approach that does not take the context into account.