Traitor: Associating Concepts using the WWW

by Wanno Drijfhout, Oliver Jundt, and Lesley Wevers

Traitor uses Common Crawl’s 25TB data set of web pages to construct a database of associated concepts using Hadoop. The database can be queried through a web application with two query interfaces. A textual interface allows searching for similarities and differences between multiple concepts using a query language similar to set notation, and a graphical interface allows users to visualize similarity relationships of concepts in a force directed graph.

To be presented at the 13th Dutch-Belgian Information Retrieval Workshop DIR 2013 on 26 April in Delft, The Netherlands

[download pdf]

Try Traitor at

