Archive for the 'Multimedia Search' Category

Will concept-based video retrieval ever work?

Monday, June 6th, 2011, posted by Djoerd Hiemstra

Simulating the future of concept-based video retrieval under improved detector performance

by Robin Aly, Djoerd Hiemstra, Franciska de Jong and Peter Apers

Multimedia Tools and Applications In this paper we address the following important questions for concept-based video retrieval: (1) What is the impact of detector performance on the performance of concept-based retrieval engines, and (2) will these engines be applicable to real-life search tasks if detector performance improves in the future? We use Monte Carlo simulations to answer these questions. To generate the simulation input, we propose to use a probabilistic model of two Gaussians for the confidence scores that concept detectors emit. Modifying the model’s parameters affects the detector performance and the search performance. We study the relation between these two performances on two video collections. For detectors with similar discriminative power and a concept vocabulary of around 100 concepts, the simulation reveals that in order to achieve a search performance of 0.20 mean average precision (MAP)—which is considered sufficient performance for real-life applications—one needs detectors with at least 0.60 MAP . We also find that, given our simulation model and low detector performance, MAP is not always a good evaluation measure for concept detectors since it is not strongly correlated with the search performance.

This article is published with open access at Springer.com

[download pdf]

AXES: Access to Audiovisual Archives

Monday, January 3rd, 2011, posted by Djoerd Hiemstra

AXES

AXES is a large-scale integrating project (IP) project funded by the European Unions’s 7th Framework Programme that starts in January 2011. The goal of AXES is to develop tools that provide various types of users with new engaging ways to interact with audiovisual libraries, helping them discover, browse, navigate, search and enrich archives. In particular, apart from a search-oriented scheme, we will explore how suggestions for audiovisual content exploration can be generated via a myriad of information trails crossing the archive. This will be approached from three perspectives (or axes): users, content, and technology.

Within AXES innovative indexing techniques are developed in close cooperation with a number of user communities through tailored use cases and validation stages. Rather than just starting new investments in technical solutions, the co-development is proposed of innovative paradigms of use and novel navigation and search facilities. We will target media professionals, educators, students, amateur researchers, and home users.

Based on an existing Open Source service platform for digital libraries, novel navigation and search functionalities will be offered via interfaces tuned to user profiles and workflow. To this end, AXES will develop tools for content analysis deploying weakly supervised classification methods. Information in scripts, audio tracks, wikis or blogs will be used for the cross-modal detection of people, places, events, etc., and for link generation between audiovisual content. Users will be engaged in the annotation process: with the support of selection and feedback tools, they will enable the gradual improvement of tagging performance. AXES technology will open up audiovisual digital libraries, increasing their cultural value and their exposure to the European public and academia at large.

The consortium is a perfect match to the multi-disciplinary nature of the project, with professional content owners, academic and industrial experts in audiovisual analysis, retrieval, and user studies, and partners experienced in system integration and project management. Our partners in AXES are: GEIE ERCIM, Katholieke Universiteit Leuven, University of Oxford, Institut National de Recherche en Informatique et en Automatique (INRIA), Dublin City University, Fraunhofer Gesellschaft, BBC, Netherlands Institute for Sound and Vision, Deutsche Welle, Technicolor, EADS, and Erasmus University Rotterdam.

PhD-position: semantic linking of multimedia content

Friday, November 5th, 2010, posted by Djoerd Hiemstra

The digital library of the future will be a dynamic and highly networked entity, consisting of both the original documents and user-generated annotations and links to and from external resources. Among other things, the Human Media Interaction (HMI) group of the University of Twente investigates the possibilities for multimedia content analysis and information linking to support and provide facilities for navigating and exploring digital libraries with content in a variety of formats including text, audio, images and video. There is funding available for a PhD position starting from January 2010.

The PhD research will be carried out in the context of AXES, a multidisciplinary research project funded by the EU (FP7, Digital Libraries). The research will focus on deploying diverse, automatically generated, time-labeled annotations -for example those coming from automatic speech recognition- for connecting heterogeneous data sources, and will be strongly evaluation-driven.

More information (deadline: 21 November)

Robin Aly defends PhD thesis on uncertainty in concept-based multimedia retrieval

Thursday, July 29th, 2010, posted by Djoerd Hiemstra

by Robin Aly

This thesis considers concept-based multimedia retrieval, where documents are represented by the occurrence of concepts (also referred to as semantic concepts or high-level features). A concept can be thought of as a kind of label, which is attached to (parts of) the multimedia documents in which it occurs. Since concept-based document representations are user, language and modality independent, using them for retrieval has great potential for improving search performance. As collections quickly grow both in volume and size, manually labeling concept occurrences becomes infeasible and the so-called concept detectors are used to decide upon the occurrence of concepts in the documents automatically.

The following fundamental problems in concept-based retrieval are identified and addressed in this thesis. First, the concept detectors frequently make mistakes while detecting concepts. Second, it is difficult for users to formulate their queries since they are unfamiliar with the concept vocabulary, and setting weights for each concept requires knowledge of the collection. Third, for supporting retrieval of longer video segments, single concept occurrences are not sufficient to differentiate relevant from non-relevant documents and some notion of the importance of a concept in a segment is needed. Finally, since current detection techniques lack performance, it is important to be able to predict what search performance retrieval engines yield, if the detection performance improves.

The main contribution of this thesis is the uncertain document representation ranking framework (URR). Based on the Nobel prize winning Portfolio Selection Theory, the URR framework considers the distribution over all possible concept-based document representations of a document given the observed confidence scores of concept detectors. For a given score function, documents are ranked by the expected score plus an additional term of the variance of the score, which represents the risk attitude of the system.

User-friendly concept selection is achieved by re-using an annotated development collection. Each video shot of the development collection is transformed into a textual description which yields a collection of textual descriptions. This collection is then searched for a textual query which does not require the user’s knowledge of the concept vocabulary. The ranking of the textual descriptions and the knowledge of the concept occurrences in the development collection allows a selection of useful concepts together with their weights.

The URR framework and the proposed concept selection method are used to derive a shot and a video segment retrieval framework. For shot retrieval, the probabilistic ranking framework for unobservable events is proposed. The framework re-uses the well-known probability of relevance score function from text retrieval. Because of the representation uncertainty, documents are ranked by their expected retrieval score given the confidence scores from the concept detectors.

For video segment retrieval, the uncertain concept language model is proposed for retrieving news items — a particular video segment type. A news item is modeled as a series of shots and represented by the frequency of each selected concept. Using the parallel between concept frequencies and term frequencies, a concept language model score function is derived from the language modelling framework. The concept language model score function is then used according to the URR framework and documents are ranked by the expected concept language score plus an additional term of the score’s variance.

The Monte Carlo Simulation method is used to predict the behavior of current retrieval models under improved concept detector performance. First, a probabilistic model of concept detector output is defined as two Gaussian distributions, one for the shots in which the concept occurs and one for the shots in which it does not. Randomly generating concept detector scores for a collection with known concept occurrences and executing a search on the generated output estimates the expected search performance given the model’s parameters. By modifying the model parameters, the detector performance can be improved and the future search performance can be predicted.

Experiments on several collections of the TRECVid evaluation benchmark showed that the URR framework often significantly improve the search performance compared to several state-of-the-art baselines. The simulation of concept detectors yields that today’s video shot retrieval models will show an acceptable performance, once the detector performance is around 0.60 mean average precision. The simulation of video segment retrieval suggests, that this task is easier and will sooner be applicable to real-life applications.

[download pdf]

Guest lecture by Alexander Hauptmann at SSR-4

Thursday, June 17th, 2010, posted by Djoerd Hiemstra

The 4th SIKS/Twente Seminar on Searching and Ranking will take place on 2nd of July at the University of Twente. The goal of the one day seminar is to bring together researchers from companies and academia working on the effectiveness of search engines. Invited speakers are:

  • Alexander Hauptmann (Carnegie Mellon University, USA)
  • Arjen de Vries (CWI and University of Delft, Netherlands)
  • Wessel Kraaij (TNO and Radboud University Nijmegen, Netherlands)
The workshop will take place at the campus of the University of Twente at the Citadel (building 9), lecture hall T300. SSR is sponsored by SIKS and CTIT.

More information at SSR-4.

Beyond Shot Retrieval

Monday, March 8th, 2010, posted by Djoerd Hiemstra

Searching for Broadcast News Items Using Language Models of Concepts

by Robin Aly, Aiden Doherty, Djoerd Hiemstra, and Alan Smeaton

Current video search systems commonly return video shots as results. We believe that users may better relate to longer, semantic video units and propose a retrieval framework for news story items, which consist of multiple shots. The framework is divided into two parts: (1) A concept based language model which ranks news items with known occurrences of semantic concepts by the probability that an important concept is produced from the concept distribution of the news item and (2) a probabilistic model of the uncertain presence, or risk, of these concepts. In this paper we use a method to evaluate the performance of story retrieval, based on the TRECVID shot-based retrieval groundtruth. Our experiments on the TRECVID 2005 collection show a significant performance improvement against four standard methods.

The paper will be presented at the 32nd European Conference on Information Retrieval (ECIR) in Milton Keynes, UK. (and in the DB colloquium of 24 March)

[download pdf]

Erwin de Moel graduates on managing recorded lectures for Collegerama

Thursday, March 4th, 2010, posted by Djoerd Hiemstra

Expanding the usability of recorded lectures: A new age in teaching and classroom instruction

by Erwin de Moel

The status of recorded lectures at Delft University of Technology has been studied in order to expand its usability in their present and future educational environment. Possibilities for the production of single file vodcasts have been tested. These videos allow for an increased accessibility of their recorded lectures through the form of other distribution platforms. Furthermore the production of subtitles has been studied. This was done with an ASR system called SHoUT, developed at University of Twente, and machine translation of subtitles into other languages. SHoUT generated transcripts always require post-processing for subtitling. Machine translation could produce translated subtitles of sufficient quality. Navigation of recorded lectures needs to be improved, requiring input of the lecturer. Collected metadata from lecture chapter titles, slide data (titles, content and notes) as well as ASR results have been used for the creation of a lecture search engine, which also produces interactive tables of content and tag clouds for each lecture. Recorded lectures could further be enhanced with time-based discussion boards, for the asking and answering of questions. Further improvements have been proposed for allowing recorded lectures to be re-used in recurring online-based courses.

Read More

DetectSim software released

Monday, January 4th, 2010, posted by Djoerd Hiemstra

DetectSim: contains software for simulating concept detectors for video retrieval. Researchers can use the software to test their concept-based video retrieval approaches without the need to build real detectors.

Concept based video retrieval is a promising search paradigm because it is fully automated and it investigates the fine grained content of a video, which is normally not captured by human annotations. Concepts are captured by so-called concept detectors. However, since these detectors do not yet show a sufficient performance, the evaluation of retrieval systems, which are built on top of the detector output, is difficult. In this report we describe a software package which generates simulated detector output for a specified performance level. Afterwards, this output can be used to execute a search run and ultimately to evaluate the performance of the proposed retrieval method, which is normally done through comparison to a baseline. The probabilistic model of the detectors are two Gaussians, one for the positive and one for the negative class. Thus, the parameters for the simulation are the two means and deviations plus the prior probability of the concept in the dataset.

Download Now!

Download Technical Report.

Towards Affordable Disclosure of Spoken Heritage Archives

Friday, December 11th, 2009, posted by Djoerd Hiemstra

by Roeland Ordelman, Willemijn Heeren, Franciska de Jong, Marijn Huijbregts, and Djoerd Hiemstra

This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken heritage archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, we at least want to provide search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is not yet satisfactory, and requires additional research.

To be published in the Journal of Digital Information 10(6).

[download pdf]

Concept Detectors: How Good is Good Enough?

Monday, August 10th, 2009, posted by Djoerd Hiemstra

A Monte Carlo Simulation Based Approach

by Robin Aly and Djoerd Hiemstra

Today, semantic concept based video retrieval systems often show insufficient performance for real-life applications. Clearly, a big share of the reason is the lacking performance of the detectors of these concepts. While concept detectors are on their endeavor to improve, following important questions need to be addressed: “How good do detectors need to be to produce usable search systems?” and “How does the detector performance influence different concept combination methods?”. We use Monte Carlo Simulations to provide answers to the above questions. The main contribution of this paper is a probabilistic model of detectors which outputs confidence scores to indicate the likelihood of a concept to occur. This score is also converted into a posterior probability and a binary classification. We investigate the influence of changes to the model’s parameters on the performance of multiple concept combination methods. Current web search engines produce a mean average precision (MAP) of around 0.20. Our simulation reveals that the best performing video search methods achieve this performance using detectors with 0.60 MAP and is therefore usable in real-life. Furthermore, perfect detection allows the best performing combination method to produce 0.39 search MAP in a artificial environment with Oracle settings. We also find that MAP is not necessarily a good evaluation measure for concept detectors since it is not always correlated with search performance.

The paper will be presented at ACM Multimedia in Bejing, China.

[download pdf]