PF/Tijah site views and downloads

Friday, January 18th, 2008, posted by Djoerd Hiemstra

The MultimediaN board asks for “Economic impact”, so I gathered some statistics on the usage of PF/Tijah:

  • Between 1 May 2007 and now, the site was visited 1162 times, 5127 page views in total
  • Between between 22 October 2007 (last release) and now, there were 1100 downloads of MonetDB/XQuery 0.20 (including PF/Tijah) 46 of those downloads, so 4.2 %, were directed from the Pf/Tijah site

PF/Tijah documentation available as technical report

Wednesday, December 19th, 2007, posted by Djoerd Hiemstra

by Djoerd Hiemstra, Henning Rode, and Jan Flokstra

PF/Tijah (Pathfinder/Tijah, pronounce as “Pee Ef Teeja“) is a flexible open source text search system developed at the University of Twente in cooperation with CWI Amsterdam and TU München. The system is integrated in the Pathfinder XQuery database system and can be downloaded as part of MonetDB/XQuery. This report contains user documentation of PF/Tijah, including example usage in three show cases.

For more information, see: PF/Tijah site

SIGIR’s 30th anniversary: an analysis of trends in IR research and the topology of its community

Wednesday, November 21st, 2007, posted by Djoerd Hiemstra

by Djoerd Hiemstra, Claudia Hauff, Franciska de Jong, and Wessel Kraaij

This paper presents an analysis of all SIGIR proceedings to date in order to summarize what IR researchers discussed over the years, where they are from, and whether subcommunities can be identified, determined by co-authorship.

[pdf] [more info]

SIGIR 30th Anniversary Search Demo

Thursday, June 21st, 2007, posted by Djoerd Hiemstra

SIGIR will have its 30th conference this year. To celebrate this, we created some fun search applications that search the abstracts of 30 years of SIGIR proceedings at:

Enter your favorite IR topic to search in the abstracts of 30 years of SIGIR and find experts, periods and geographical locations associated with your search. The “mystery link” will be revealed at the conference in Amsterdam.

Vojkan Mihajlovic defends Ph.D. thesis on structured information retrieval

Friday, December 8th, 2006, posted by Djoerd Hiemstra

Score Region Algebra: A flexible framework for structured information retrieval

by Vojkan Mihajlovic

The scope of the research presented in this thesis is the retrieval of relevant information from structured documents. The thesis describes a framework for information retrieval in documents that have some form of annotation used for describing logical and semantical document structure, such as XML and SGML. The development of the structured information retrieval framework follows the ideas from both database and information retrieval worlds. It uses a three-level database architecture and implements relevance scoring mechanisms inherited from information retrieval models.

To develop the structured retrieval framework, the problem of structured information retrieval is analyzed and elementary requirements for structured retrieval systems are specified. These requirements are: (1) entity selection - the selection of different entities in structured documents, such as elements, terms, attributes, image and video references, which are parts of the user query; (2) entity relevance score computation - the computation of relevance scores for different structured elements with respect to the content they contain; (3) relevance score combination - the combination of relevance scores from (different) elements in a document structure, resulting in a common element relevance score; (4) relevance score propagation - the propagation of scores from different elements to common ancestor or descendant elements following the query. These four requirements are supported when developing a database logical algebra in harmony with the retrieval models used for ranking. In the specification of the logical algebra we face a challenge of a transparent instantiation of retrieval models, i.e., the specification of different retrieval models without affecting the algebra operators.

Download Vojkan’s thesis from EPrints.

PF/Tijah: text search in an XML database system

Wednesday, July 26th, 2006, posted by Djoerd Hiemstra

by Djoerd Hiemstra, Henning Rode, Roel van Os and Jan Flokstra

This paper introduces the PF/Tijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PF/Tijah is part of the open source release of MonetDB/XQuery.

[download pdf] [more info]

PFTijah is up and running!

Wednesday, April 12th, 2006, posted by Djoerd Hiemstra
We’ve run our first NEXI query on the combined system! The NEXI query is compiled from within XQuery, executed, and the results are stored in a BAT. Jan is now working on relating the results back to Pathfinder nodes.

PFTijah Wiki public

Tuesday, February 14th, 2006, posted by Djoerd Hiemstra
PFTijah is the name of an internal project we started wihtin MultimediaN MN5 semantic access. The main goal of the project is creating a flexible environment for setting up search systems by integrating the PathFinder XQuery system with our Tijah XML information retrieval system. Watch the Wiki for system releases and new features.

