Archive for the 'Multimedia Search' Category

Johannes Wassenaar graduates on automatic video hyperlinking

Monday, April 16th, 2018, posted by Djoerd Hiemstra

by Johannes Wassenaar

Linking segments of video using text-based methods and a flexible form of segmentation

In order to let user’s explore, and use large archives, video hyperlinking tries to aid the user in linking segments of video to other segments of videos, similar to the way hyperlinks on the web are used – instead of using a regular search tool. Indexing, querying and re-ranking multimodal data, in this case video’s, are subjects common in the video hyperlinking community. A video hyperlinking system contains an index of multimodal (video) data, while the currently watched segment is translated into a query, the query generation phase. Finally, the system responds to the user with a ranked list of targets that are about the anchor segment. In this study, the payload of terms in the form of position and offset in Elastic Search are used to obtain time-based information along the speech transcripts to link users directly to spoken text. The queries are generated by a statistic-based method using TF-IDF, a grammar-based part-of-speech tagger or a combination of both. Finally, results are ranked by weighting specific components and cosine similarity. The system is evaluated with the Precision at 5 and MAiSP measures, which are used in the TRECVid benchmark on this topic. The results show that TF-IDF and the cosine similarity work the best for the proposed system.

[download pdf]

Emiel Mols graduates on sharding Spotify search

Wednesday, August 15th, 2012, posted by Djoerd Hiemstra

Today, Emiel Mols graduated when presenting the master thesis project he did at Spotify in Stockholm, Sweden. Emiel got quite some attention last year when he launched SpotifyOnTheWeb, leaving Spotify “no choice but to hire him”.

In the master thesis, Emiel describes a prototype implementation of a term sharded full text search architecture. The system’s requirements are based on the use case of searching for music in the Spotify catalogue. He benchmarked the system using non-synthethic data gathered from Spotify’s infrastructure.

The thesis will be available from ePrints.

A framework for concept-based video retrieval

Friday, August 10th, 2012, posted by Djoerd Hiemstra

The Uncertain Representation Ranking Framework for Concept-Based Video Retrieval

by Robin Aly, Aiden Doherty (DCU, Ireland), Djoerd Hiemstra, Franciska de Jong, and Alan Smeaton (DCU, Ireland)

Concept based video retrieval often relies on imperfect and uncertain concept detectors. We propose a general ranking framework to define effective and robust ranking functions, through explicitly addressing detector uncertainty. It can cope with multiple concept-based representations per video segment and it allows the re-use of effective text retrieval functions which are defined on similar representations. The final ranking status value is a weighted combination of two components: the expected score of the possible scores, which represents the risk-neutral choice, and the scores’ standard deviation, which represents the risk or opportunity that the score for the actual representation is higher. The framework consistently improves the search performance in the shot retrieval task and the segment retrieval task over several baselines in five TRECVid collections and two collections which use simulated detectors of varying performance.

[more information]

AXES at TRECVid 2011

Friday, December 23rd, 2011, posted by Djoerd Hiemstra

by Kevin McGuinness, Robin Aly, et al.

The AXES project participated in the interactive known-item search task (KIS) and the interactive instance search task (INS) for TRECVid 2011. We used the same system architecture and a nearly identical user interface for both the KIS and INS tasks. Both systems made use of text search on ASR, visual concept detectors, and visual similarity search. The user experiments were carried out with media professionals and media students at the Netherlands Institute for Sound and Vision, with media professionals performing the KIS task and media students participating in the INS task. This paper describes the results and findings of our experiments.

[download pdf]

AXES: Access to Audiovisual Archives

Monday, January 3rd, 2011, posted by Djoerd Hiemstra


AXES is a large-scale integrating project (IP) project funded by the European Unions’s 7th Framework Programme that starts in January 2011. The goal of AXES is to develop tools that provide various types of users with new engaging ways to interact with audiovisual libraries, helping them discover, browse, navigate, search and enrich archives. In particular, apart from a search-oriented scheme, we will explore how suggestions for audiovisual content exploration can be generated via a myriad of information trails crossing the archive. This will be approached from three perspectives (or axes): users, content, and technology.

Within AXES innovative indexing techniques are developed in close cooperation with a number of user communities through tailored use cases and validation stages. Rather than just starting new investments in technical solutions, the co-development is proposed of innovative paradigms of use and novel navigation and search facilities. We will target media professionals, educators, students, amateur researchers, and home users.

Based on an existing Open Source service platform for digital libraries, novel navigation and search functionalities will be offered via interfaces tuned to user profiles and workflow. To this end, AXES will develop tools for content analysis deploying weakly supervised classification methods. Information in scripts, audio tracks, wikis or blogs will be used for the cross-modal detection of people, places, events, etc., and for link generation between audiovisual content. Users will be engaged in the annotation process: with the support of selection and feedback tools, they will enable the gradual improvement of tagging performance. AXES technology will open up audiovisual digital libraries, increasing their cultural value and their exposure to the European public and academia at large.

The consortium is a perfect match to the multi-disciplinary nature of the project, with professional content owners, academic and industrial experts in audiovisual analysis, retrieval, and user studies, and partners experienced in system integration and project management. Our partners in AXES are: GEIE ERCIM, Katholieke Universiteit Leuven, University of Oxford, Institut National de Recherche en Informatique et en Automatique (INRIA), Dublin City University, Fraunhofer Gesellschaft, BBC, Netherlands Institute for Sound and Vision, Deutsche Welle, Technicolor, EADS, and Erasmus University Rotterdam.

PhD-position: semantic linking of multimedia content

Friday, November 5th, 2010, posted by Djoerd Hiemstra

The digital library of the future will be a dynamic and highly networked entity, consisting of both the original documents and user-generated annotations and links to and from external resources. Among other things, the Human Media Interaction (HMI) group of the University of Twente investigates the possibilities for multimedia content analysis and information linking to support and provide facilities for navigating and exploring digital libraries with content in a variety of formats including text, audio, images and video. There is funding available for a PhD position starting from January 2010.

The PhD research will be carried out in the context of AXES, a multidisciplinary research project funded by the EU (FP7, Digital Libraries). The research will focus on deploying diverse, automatically generated, time-labeled annotations -for example those coming from automatic speech recognition- for connecting heterogeneous data sources, and will be strongly evaluation-driven.

More information (deadline: 21 November)

Robin Aly defends PhD thesis on uncertainty in concept-based multimedia retrieval

Thursday, July 29th, 2010, posted by Djoerd Hiemstra

by Robin Aly

This thesis considers concept-based multimedia retrieval, where documents are represented by the occurrence of concepts (also referred to as semantic concepts or high-level features). A concept can be thought of as a kind of label, which is attached to (parts of) the multimedia documents in which it occurs. Since concept-based document representations are user, language and modality independent, using them for retrieval has great potential for improving search performance. As collections quickly grow both in volume and size, manually labeling concept occurrences becomes infeasible and the so-called concept detectors are used to decide upon the occurrence of concepts in the documents automatically.

The following fundamental problems in concept-based retrieval are identified and addressed in this thesis. First, the concept detectors frequently make mistakes while detecting concepts. Second, it is difficult for users to formulate their queries since they are unfamiliar with the concept vocabulary, and setting weights for each concept requires knowledge of the collection. Third, for supporting retrieval of longer video segments, single concept occurrences are not sufficient to differentiate relevant from non-relevant documents and some notion of the importance of a concept in a segment is needed. Finally, since current detection techniques lack performance, it is important to be able to predict what search performance retrieval engines yield, if the detection performance improves.

The main contribution of this thesis is the uncertain document representation ranking framework (URR). Based on the Nobel prize winning Portfolio Selection Theory, the URR framework considers the distribution over all possible concept-based document representations of a document given the observed confidence scores of concept detectors. For a given score function, documents are ranked by the expected score plus an additional term of the variance of the score, which represents the risk attitude of the system.

User-friendly concept selection is achieved by re-using an annotated development collection. Each video shot of the development collection is transformed into a textual description which yields a collection of textual descriptions. This collection is then searched for a textual query which does not require the user’s knowledge of the concept vocabulary. The ranking of the textual descriptions and the knowledge of the concept occurrences in the development collection allows a selection of useful concepts together with their weights.

The URR framework and the proposed concept selection method are used to derive a shot and a video segment retrieval framework. For shot retrieval, the probabilistic ranking framework for unobservable events is proposed. The framework re-uses the well-known probability of relevance score function from text retrieval. Because of the representation uncertainty, documents are ranked by their expected retrieval score given the confidence scores from the concept detectors.

For video segment retrieval, the uncertain concept language model is proposed for retrieving news items — a particular video segment type. A news item is modeled as a series of shots and represented by the frequency of each selected concept. Using the parallel between concept frequencies and term frequencies, a concept language model score function is derived from the language modelling framework. The concept language model score function is then used according to the URR framework and documents are ranked by the expected concept language score plus an additional term of the score’s variance.

The Monte Carlo Simulation method is used to predict the behavior of current retrieval models under improved concept detector performance. First, a probabilistic model of concept detector output is defined as two Gaussian distributions, one for the shots in which the concept occurs and one for the shots in which it does not. Randomly generating concept detector scores for a collection with known concept occurrences and executing a search on the generated output estimates the expected search performance given the model’s parameters. By modifying the model parameters, the detector performance can be improved and the future search performance can be predicted.

Experiments on several collections of the TRECVid evaluation benchmark showed that the URR framework often significantly improve the search performance compared to several state-of-the-art baselines. The simulation of concept detectors yields that today’s video shot retrieval models will show an acceptable performance, once the detector performance is around 0.60 mean average precision. The simulation of video segment retrieval suggests, that this task is easier and will sooner be applicable to real-life applications.

[download pdf]

Guest lecture by Alexander Hauptmann at SSR-4

Thursday, June 17th, 2010, posted by Djoerd Hiemstra

The 4th SIKS/Twente Seminar on Searching and Ranking will take place on 2nd of July at the University of Twente. The goal of the one day seminar is to bring together researchers from companies and academia working on the effectiveness of search engines. Invited speakers are:

  • Alexander Hauptmann (Carnegie Mellon University, USA)
  • Arjen de Vries (CWI and University of Delft, Netherlands)
  • Wessel Kraaij (TNO and Radboud University Nijmegen, Netherlands)
The workshop will take place at the campus of the University of Twente at the Citadel (building 9), lecture hall T300. SSR is sponsored by SIKS and CTIT.

More information at SSR-4.

Beyond Shot Retrieval

Monday, March 8th, 2010, posted by Djoerd Hiemstra

Searching for Broadcast News Items Using Language Models of Concepts

by Robin Aly, Aiden Doherty, Djoerd Hiemstra, and Alan Smeaton

Current video search systems commonly return video shots as results. We believe that users may better relate to longer, semantic video units and propose a retrieval framework for news story items, which consist of multiple shots. The framework is divided into two parts: (1) A concept based language model which ranks news items with known occurrences of semantic concepts by the probability that an important concept is produced from the concept distribution of the news item and (2) a probabilistic model of the uncertain presence, or risk, of these concepts. In this paper we use a method to evaluate the performance of story retrieval, based on the TRECVID shot-based retrieval groundtruth. Our experiments on the TRECVID 2005 collection show a significant performance improvement against four standard methods.

The paper will be presented at the 32nd European Conference on Information Retrieval (ECIR) in Milton Keynes, UK. (and in the DB colloquium of 24 March)

[download pdf]

Erwin de Moel graduates on managing recorded lectures for Collegerama

Thursday, March 4th, 2010, posted by Djoerd Hiemstra

Expanding the usability of recorded lectures: A new age in teaching and classroom instruction

by Erwin de Moel

The status of recorded lectures at Delft University of Technology has been studied in order to expand its usability in their present and future educational environment. Possibilities for the production of single file vodcasts have been tested. These videos allow for an increased accessibility of their recorded lectures through the form of other distribution platforms. Furthermore the production of subtitles has been studied. This was done with an ASR system called SHoUT, developed at University of Twente, and machine translation of subtitles into other languages. SHoUT generated transcripts always require post-processing for subtitling. Machine translation could produce translated subtitles of sufficient quality. Navigation of recorded lectures needs to be improved, requiring input of the lecturer. Collected metadata from lecture chapter titles, slide data (titles, content and notes) as well as ASR results have been used for the creation of a lecture search engine, which also produces interactive tables of content and tag clouds for each lecture. Recorded lectures could further be enhanced with time-based discussion boards, for the asking and answering of questions. Further improvements have been proposed for allowing recorded lectures to be re-used in recurring online-based courses.

Read More