PF/Tijah frozen in time

Monday, June 6th, 2011, posted by Djoerd Hiemstra

Maintenance of PF/Tijah is discontinued. We can give limited support for people that use the old code-base, but we will no longer fix bugs introduced by for instance running PF/Tijah on new operating systems and new hardware.

New release of PF/Tijah

Thursday, March 4th, 2010, posted by Djoerd Hiemstra

The newest stable release of MonetDB/PF/Tijah is now on-line, version 0.13.0 as part of Pathfinder 0.36.1

Towards Affordable Disclosure of Spoken Heritage Archives

Friday, December 11th, 2009, posted by Djoerd Hiemstra

by Roeland Ordelman, Willemijn Heeren, Franciska de Jong, Marijn Huijbregts, and Djoerd Hiemstra

This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken heritage archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, we at least want to provide search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is not yet satisfactory, and requires additional research.

To be published in the Journal of Digital Information 10(6).

Pathinder meeting in Dagstuhl

Monday, February 16th, 2009, posted by Djoerd Hiemstra

Project Meeting Pathfinder

Everyone, please stand on the castle’s axis steps

Digital museum of information retrieval research

Tuesday, February 3rd, 2009, posted by Djoerd Hiemstra

by Djoerd Hiemstra, Tristan Pothoven, Marijn van Vliet, and Donna Harman

As more and more of the world becomes digital, and documents become easily available over the Internet, we are suddenly able to access all kinds of information. The downside of this however is that information that is not digital becomes less accessed, and is liable to be lost to us and to future generations. Whereas there are many scanning projects underway, such as Google books and the Open Library Alliance, these projects are not going to know about, much less find, the specialized scientific literature within various fields. This short paper describes the beginnings of a project to digitize some of the older literature in the information retrieval field. The paper finishes with some thoughts for future work on making more of our IR literature available for searching.

The technology behind StreetTiVo

Monday, January 26th, 2009, posted by Djoerd Hiemstra

StreetTiVo: Using a P2P XML Database System to Manage Multimedia Data in Your Living Room

by Ying Zhang, Arjen de Vries, Peter Boncz, Djoerd Hiemstra, and Roeland Ordelman

StreetTiVo is a project that aims at bringing research results into the living room; in particular, a mix of current results in the areas of Peer-to-Peer XML Database Management System (P2P XDBMS), advanced multimedia analysis techniques, and advanced information retrieval techniques. The project develops a plug-in application for the so-called Home Theatre PCs, such as set-top boxes with MythTV or Windows Media Center Edition installed, that can be considered as programmable digital video recorders. StreetTiVo distributes computeintensive multimedia analysis tasks over multiple peers (i.e., StreetTiVo users) that have recorded the same TV program, such that a user can search in the content of a recorded TV program shortly after its broadcasting; i.e., it enables near real-time availability of the meta-data (e.g., speech recognition) required for searching the recorded content. Street- TiVo relies on our P2P XDBMS technology, which in turn is based on a DHT overlay network, for distributed collaborator discovery, work coordination and meta-data exchange in a volatile WAN environment. The technologies of video analysis and information retrieval are seamlessly integrated into the system as XQuery functions.

The paper will be presented at the Joint International Conferences on Asia-Pacific Web Conference (APWeb) and Web-Age Information Management (WAIM) on 1-4 April, 2009 in Suzhou, China

PF/Tijah facts and figures

Monday, January 12th, 2009, posted by Djoerd Hiemstra

The MultimediaN project will finish later this year, and the MultimediaN board asks for the “economic impact” of PF/Tijah.

  • In 2008, the PF/Tijah web site was visited 1,885 times, 6,284 page views in total.
  • During that period, MonetDB/XQuery was downloaded 75 times via the PF/Tijah site. In total MonetDB/XQuery, including PF/Tijah, was downloaded over 2000 times in 2008.

Saving and Accessing the Old IR Literature

Thursday, January 8th, 2009, posted by Djoerd Hiemstra

SIGIR presents the first results of a project to digitize the older literature in the information retrieval field. So far 14 of the old reports, such as the Cranfield reports and the SMART reports have been scanned, along with Karen Sparck Jones’s Information Retrieval Experiment book. The PDF versions of these are available from the SIGIR Digital Museum of Information Retrieval Research, that provides room for exhibits of historic interest, and allows searching of the material using the PF/Tijah XML search system. The complete library is available for download on request. Requests can be directed to the SIGIR Information Director by sending an email to

Efficient XML and Entity Retrieval with PF/Tijah

Wednesday, December 3rd, 2008, posted by Djoerd Hiemstra

by Henning Rode, Djoerd Hiemstra, Arjen de Vries, and Pavel Serdyukov

PF/Tijah is a research prototype created by the University of Twente and CWI Amsterdam with the goal to create a flexible environment for setting up search systems. PF/Tijah is first of all a system for structured retrieval on XML data. Compared to other open source retrieval systems it comes with a number or unique features:

  • It can execute any NEXI query without limits to a predefined set of tags. Using the same index, it can easily produce a “focused”, “thorough”, or “article” ranking, depending only on the specified query and retrieval options.
  • The applied retrieval model, score propagation and combination operators are set at query time, which makes PF/Tijah an ideal experimental platform.
  • PF/Tijah embeds NEXI queries as functions in the XQuery language. This way the system supports ad hoc result presentation by means of its query language. The INEX efficiency task submission described in the paper demonstrates this feature. The declared function INEXPath for instance computes a string that matches the desired INEX submission format.
  • PF/Tijah supports text search combined with traditional database querying, including for instance joins on values. The entity ranking experiments described in this article intensively exploit this feature.
With this year’s INEX experiments, we try to demonstrate the mentioned features of the system. All experiments were carried out with the least possible pre- and post-processing outside PF/Tijah.

New group member: Anandeshwar Singh

Sunday, November 23rd, 2008, posted by Djoerd Hiemstra
Anandeshwar Singh will work on an XQuery Full-text version of PF/Tijah. XQuery Full-text is a W3C Candidate Recommendation that extends XQuery for text search in XML data. Welcome Anandeshwar!