by Debora Donato (Yahoo, Research Barcelona, Spain)
joint work with: Carlos Castillo, Aristides Gionis, Antti Ukkonen
We study the problem of context-sensitive search, in which a
query consists of a set of terms q and a document p already
in the collection, and the task is to rank the documents in
decreasing order of relevance with respect to the query hq, pi.
The document p provides the context in which the query
terms q should be understood. By attaching a context to
the query terms, the search results of a search initiated in a
particular page can be made more relevant.
If we consider the collection of documents as a directed
graph G with vertices the documents and edges the links
among the documents, an intuitive approach for the problem
is to employ the personalized PageRank computation by
defining the teleport vector to be the vector that jumps back
to the query document p. However, computing and storing
personalized PageRank vectors for each documents separately
is infeasible. Instead we compute a partition of G by
using a spectral method and we store a personalization vector
only for each cluster. In addition to these approximate
personalization vectors we use also traditional information
retrieval measures such as BM25 and other graph-based similarity
measures to form feature vectors. RankSVM is then
used to learn weights for individual features given suitably
constructed training data. Documents are ranked at query
time using the inner product of the feature and the weight
vectors.
The experiments made using Wikipedia indicate that the
proposed method considerably improves results obtained by
a more traditional approach that does not take the context
into account.