New release of PF/Tijah
Thursday, March 4th, 2010, posted by Djoerd HiemstraThe newest stable release of MonetDB/PF/Tijah is now on-line, version 0.13.0 as part of Pathfinder 0.36.1
More info on the PF/Tijah site.
The newest stable release of MonetDB/PF/Tijah is now on-line, version 0.13.0 as part of Pathfinder 0.36.1
More info on the PF/Tijah site.
by Roeland Ordelman, Willemijn Heeren, Franciska de Jong, Marijn Huijbregts, and Djoerd Hiemstra
This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken heritage archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, we at least want to provide search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is not yet satisfactory, and requires additional research.
To be published in the Journal of Digital Information 10(6).
by Djoerd Hiemstra, Tristan Pothoven, Marijn van Vliet, and Donna Harman
As more and more of the world becomes digital, and documents become easily available over the Internet, we are suddenly able to access all kinds of information. The downside of this however is that information that is not digital becomes less accessed, and is liable to be lost to us and to future generations. Whereas there are many scanning projects underway, such as Google books and the Open Library Alliance, these projects are not going to know about, much less find, the specialized scientific literature within various fields. This short paper describes the beginnings of a project to digitize some of the older literature in the information retrieval field. The paper finishes with some thoughts for future work on making more of our IR literature available for searching.
StreetTiVo: Using a P2P XML Database System to Manage Multimedia Data in Your Living Room
by Ying Zhang, Arjen de Vries, Peter Boncz, Djoerd Hiemstra, and Roeland Ordelman
StreetTiVo is a project that aims at bringing research results into the living room; in particular, a mix of current results in the areas of Peer-to-Peer XML Database Management System (P2P XDBMS), advanced multimedia analysis techniques, and advanced information retrieval techniques. The project develops a plug-in application for the so-called Home Theatre PCs, such as set-top boxes with MythTV or Windows Media Center Edition installed, that can be considered as programmable digital video recorders. StreetTiVo distributes computeintensive multimedia analysis tasks over multiple peers (i.e., StreetTiVo users) that have recorded the same TV program, such that a user can search in the content of a recorded TV program shortly after its broadcasting; i.e., it enables near real-time availability of the meta-data (e.g., speech recognition) required for searching the recorded content. Street- TiVo relies on our P2P XDBMS technology, which in turn is based on a DHT overlay network, for distributed collaborator discovery, work coordination and meta-data exchange in a volatile WAN environment. The technologies of video analysis and information retrieval are seamlessly integrated into the system as XQuery functions.
The paper will be presented at the Joint International Conferences on Asia-Pacific Web Conference (APWeb) and Web-Age Information Management (WAIM) on 1-4 April, 2009 in Suzhou, China
The MultimediaN project will finish later this year, and the MultimediaN board asks for the “economic impact” of PF/Tijah.
Go to: PF/Tijah.
SIGIR presents the first results of a project to digitize the older literature in the information retrieval field. So far 14 of the old reports, such as the Cranfield reports and the SMART reports have been scanned, along with Karen Sparck Jones’s Information Retrieval Experiment book. The PDF versions of these are available from the SIGIR Digital Museum of Information Retrieval Research, that provides room for exhibits of historic interest, and allows searching of the material using the PF/Tijah XML search system. The complete library is available for download on request. Requests can be directed to the SIGIR Information Director by sending an email to infodir_sigir@acm.org.
by Henning Rode, Djoerd Hiemstra, Arjen de Vries, and Pavel Serdyukov
PF/Tijah is a research prototype created by the University of Twente and CWI Amsterdam with the goal to create a flexible environment for setting up search systems. PF/Tijah is first of all a system for structured retrieval on XML data. Compared to other open source retrieval systems it comes with a number or unique features:
by Roeland Ordelman, Willemijn Heeren, Marijn Huijbregts, Djoerd Hiemstra, and Franciska de Jong
This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research.