Archive for the 'Multimedia Search' Category

Towards Affordable Disclosure of Spoken Heritage Archives

Friday, December 11th, 2009, posted by Djoerd Hiemstra

by Roeland Ordelman, Willemijn Heeren, Franciska de Jong, Marijn Huijbregts, and Djoerd Hiemstra

This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken heritage archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, we at least want to provide search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is not yet satisfactory, and requires additional research.

To be published in the Journal of Digital Information 10(6).

[download pdf]

Concept Detectors: How Good is Good Enough?

Monday, August 10th, 2009, posted by Djoerd Hiemstra

A Monte Carlo Simulation Based Approach

by Robin Aly and Djoerd Hiemstra

Today, semantic concept based video retrieval systems often show insufficient performance for real-life applications. Clearly, a big share of the reason is the lacking performance of the detectors of these concepts. While concept detectors are on their endeavor to improve, following important questions need to be addressed: “How good do detectors need to be to produce usable search systems?” and “How does the detector performance influence different concept combination methods?”. We use Monte Carlo Simulations to provide answers to the above questions. The main contribution of this paper is a probabilistic model of detectors which outputs confidence scores to indicate the likelihood of a concept to occur. This score is also converted into a posterior probability and a binary classification. We investigate the influence of changes to the model’s parameters on the performance of multiple concept combination methods. Current web search engines produce a mean average precision (MAP) of around 0.20. Our simulation reveals that the best performing video search methods achieve this performance using detectors with 0.60 MAP and is therefore usable in real-life. Furthermore, perfect detection allows the best performing combination method to produce 0.39 search MAP in a artificial environment with Oracle settings. We also find that MAP is not necessarily a good evaluation measure for concept detectors since it is not always correlated with search performance.

The paper will be presented at ACM Multimedia in Bejing, China.

[download pdf]

Robin Aly presents at SIGIR Doctoral Consortium

Wednesday, June 10th, 2009, posted by Djoerd Hiemstra

Modeling Uncertainty in Video Retrieval: A Retrieval Model for Uncertain Semantic Representations of Videos

by Robin Aly

The need for content based multimedia retrieval increases rapidly because of ever faster growing collection sizes. However, retrieval systems often do not perform well enough for real-life applications. A promising approach is to detect semantic primitives at indexing time. Currently investigated primitives are: the uttering of the words and the occurrence of so-called semantic concepts, such as “Outdoor” and “Person”. We refer to a concrete instantiation of these primitives as the representation of the video document. Most detector programs emit scores reflecting the likelihood of each primitive. However, the detection is far from perfect and a lot of uncertainty about the real representation remains. Some retrieval algorithms ignore this uncertainty, which clearly hurts precision and recall. Other methods use the scores as anonymous features and learn their relationship to relevance. This has the disadvantage of requiring vast amounts of training data and has to be redone for every detector change.

The main contribution of our work is a formal retrieval model of treating this uncertainty. We conceptually consider the retrieval problem as two steps: (1) the determination of the posterior probability distribution given the scores over all representations (using existing methods) and (2) the derivation of a ranking status value (RSV) for each representation. We then take the expected RSV weighted by the respresentation’s posterior probability as the effective RSV of this shot for ranking. We claim that our approach has following advantages: (a) that step (2) is easier achieved than using the machine learning alternative and (b) that it benefits from all detector improvements.

[more information]

Reusing Annotation Labor for Concept Selection

Wednesday, May 6th, 2009, posted by Djoerd Hiemstra

by Robin Aly, Djoerd Hiemstra and Arjen de Vries

Describing shots through the occurrence of semantic concepts is the first step towards modeling the content of a video semantically. An important challenge is to automatically select the right concepts for a given information need. For example, systems should be able to decide whether the concept “Outdoor” should be included into a search for “Street Basketball”. In this paper we provide an innovative method to automatically select concepts for an information need. To achieve this, we provide an estimation for the occurrence probability of a concept in relevant shots, which helps us to quantify the helpfulness of a concept. Our method reuses existing training data which is annotated with concept occurrences to build a text collection. Searching in this collection with a text retrieval system and knowing about the concept occurrences allows us to come up with a good estimate for this probability. We evaluate our method against a concept selection benchmark and search runs on both the TRECVID 2005 and 2007 collections. These experiments show that the estimation consistently improves retrieval effectiveness.

[download pdf]

The technology behind StreetTiVo

Monday, January 26th, 2009, posted by Djoerd Hiemstra

StreetTiVo: Using a P2P XML Database System to Manage Multimedia Data in Your Living Room

by Ying Zhang, Arjen de Vries, Peter Boncz, Djoerd Hiemstra, and Roeland Ordelman

StreetTiVo is a project that aims at bringing research results into the living room; in particular, a mix of current results in the areas of Peer-to-Peer XML Database Management System (P2P XDBMS), advanced multimedia analysis techniques, and advanced information retrieval techniques. The project develops a plug-in application for the so-called Home Theatre PCs, such as set-top boxes with MythTV or Windows Media Center Edition installed, that can be considered as programmable digital video recorders. StreetTiVo distributes computeintensive multimedia analysis tasks over multiple peers (i.e., StreetTiVo users) that have recorded the same TV program, such that a user can search in the content of a recorded TV program shortly after its broadcasting; i.e., it enables near real-time availability of the meta-data (e.g., speech recognition) required for searching the recorded content. Street- TiVo relies on our P2P XDBMS technology, which in turn is based on a DHT overlay network, for distributed collaborator discovery, work coordination and meta-data exchange in a volatile WAN environment. The technologies of video analysis and information retrieval are seamlessly integrated into the system as XQuery functions.

The paper will be presented at the Joint International Conferences on Asia-Pacific Web Conference (APWeb) and Web-Age Information Management (WAIM) on 1-4 April, 2009 in Suzhou, China

[download pdf]

User study for concept retrieval available

Monday, November 3rd, 2008, posted by Djoerd Hiemstra

In our recent TRECVID experiments we evaluated a concept retrieval approach to video retrieval, i.e. the user searches a collection of video shots by using automatically detected concepts such as face, people, indoor, sky, building, etc. The performance of such systems is still far from sufficient to be usable in reality, but is this because automatic detectors are bad? because users cannot write concept queries? because systems cannot rank concepts queries? or possibly, all of the above?

To help researchers answering this question, we made the data from a user study involving 24 users available. In the experiment, users had to select from a set of 101 concepts those concepts they expect to be helpful for finding certain information. For instance, suppose one needs to find shots of “one or more palm trees”. Most people, 18 out of 24, choose the concept tree, but others choose outdoor (15), vegetation (9), sky (8), beach (8), or desert (4). The summarized results can be accessed now from Robin Aly’s page.

Download the user study data.

TREC Video Workshop 2008

Friday, October 31st, 2008, posted by Djoerd Hiemstra

by Robin Aly, Djoerd Hiemstra, Arjen de Vries, and Henning Rode

In this report we describe our experiments performed for TRECVID 2008. We participated in the High Level Feature extraction and the Search task. For the High Level Feature extraction task we mainly installed our detection environment. In the Search task we applied our new PRFUBE ranking model together with an estimation method which estimates a vital parameter of the model, the probability of a concept occurring in relevant shots. The PRFUBE model has similarities to the well known Probabilistic Text Information Retrieval methodology and follows the Probability Ranking Principle.

[download pdf]

Towards Affordable Disclosure of Spoken Word Archives

Thursday, October 30th, 2008, posted by Djoerd Hiemstra

by Roeland Ordelman, Willemijn Heeren, Marijn Huijbregts, Djoerd Hiemstra, and Franciska de Jong

This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research.

[download pdf]

Guest lecture: Thijs Westerveld of TeezIR

Friday, September 26th, 2008, posted by Djoerd Hiemstra

Opinion Mining and Multimedia Information Retrieval

Who: Thijs Westerveld (TeezIR)
When: Wednesday, October 1, 2008, 13.45-15.30 h., room HO-3136
What: Opinion Mining and Multimedia Information Retrieval

Thijs Westerveld has over 10 years of experience in various areas of information retrieval, mostly in an academic setting. He received a PhD in Computer Science from the Human Media Interaction group at the University of Twente in 2004, on the use of generative probabilistic models for multimedia retrieval. Thijs has worked on numerous national and European projects in the areas of information retrieval and multimedia retrieval, and published in international journals and conferences in the field. Thijs currently works at TeezIR Search Solutions in Utrecht.

A Probabilistic Ranking Framework using Unobservable Binary Events for Video Search

Monday, May 19th, 2008, posted by Djoerd Hiemstra

by Robin Aly, Djoerd Hiemstra, Arjen de Vries, and Franciska de Jong

CIVR 2008, Niagara Falls This paper concerns the problem of search using the output of concept detectors (also known as high-level features) for video retrieval. Unlike term occurrence in text documents, the event of the occurrence of an audiovisual concept is only indirectly observable. We develop a probabilistic ranking framework for unobservable binary events to search in videos, called PR-FUBE. The framework explicitly models the probability of relevance of a video shot through the presence and absence of concepts. From our framework, we derive a ranking formula and show its relationship to previously proposed formulas. We evaluate our framework against two other retrieval approaches using the TRECVID 2005 and 2007 datasets. Especially using large numbers of concepts for retrieval results in good performance. We attribute the observed robustness against the noise introduced by less related concepts to the effective combination of concept presence and absence in our method. The experiments show that an accurate estimate for the probability of occurrence of a particular concept in relevant shots is crucial to obtain effective retrieval results.

The paper will be presented at the ACM International Conference on Image and Video Retrieval CIVR 2008 in Niagara Falls, Canada

[download pdf]