Query Load Balancing in P2P Search
Monday, January 10th, 2011, posted by Djoerd HiemstraQuery Load Balancing by Caching Search Results in Peer-to-Peer Information Retrieval Networks
by Almer Tigelaar and Djoerd Hiemstra
For peer-to-peer web search engines it is important to keep the delay between receiving a query and providing search results within an acceptable range for the end user. How to achieve this remains an open challenge. One way to reduce delays is by caching search results for queries and allowing peers to access each others cache. In this paper we explore the limitations of search result caching in large-scale peer-to-peer information retrieval networks by simulating such networks with increasing levels of realism. We find that cache hit ratios of at least thirty-three percent are attainable.
The paper will be presented at the 11th Dutch-Belgian Information Retrieval Workshop (DIR) on February 4 in Amsterdam
MIREX (MapReduce Information Retrieval Experiments) provides solutions to easily and quickly run large-scale information retrieval experiments on a cluster of machines using Hadoop. Version 0.1 has tools for the TREC ClueWeb09 collection.The code is available to other researchers at: