Robin Aly presents at SIGIR Doctoral Consortium
Wednesday, June 10th, 2009, posted by Djoerd HiemstraModeling Uncertainty in Video Retrieval: A Retrieval Model for Uncertain Semantic Representations of Videos
by Robin Aly
The need for content based multimedia retrieval increases rapidly because of ever faster growing collection sizes. However, retrieval systems often do not perform well enough for real-life applications. A promising approach is to detect semantic primitives at indexing time. Currently investigated primitives are: the uttering of the words and the occurrence of so-called semantic concepts, such as “Outdoor” and “Person”. We refer to a concrete instantiation of these primitives as the representation of the video document. Most detector programs emit scores reflecting the likelihood of each primitive. However, the detection is far from perfect and a lot of uncertainty about the real representation remains. Some retrieval algorithms ignore this uncertainty, which clearly hurts precision and recall. Other methods use the scores as anonymous features and learn their relationship to relevance. This has the disadvantage of requiring vast amounts of training data and has to be redone for every detector change.
The main contribution of our work is a formal retrieval model of treating this uncertainty. We conceptually consider the retrieval problem as two steps: (1) the determination of the posterior probability distribution given the scores over all representations (using existing methods) and (2) the derivation of a ranking status value (RSV) for each representation. We then take the expected RSV weighted by the respresentation’s posterior probability as the effective RSV of this shot for ranking. We claim that our approach has following advantages: (a) that step (2) is easier achieved than using the machine learning alternative and (b) that it benefits from all detector improvements.
This paper concerns the problem of search using the output of
concept detectors (also known as high-level features) for
video retrieval.
Unlike term occurrence in text documents, the event of the occurrence
of an audiovisual concept is only indirectly observable. We
develop a probabilistic ranking framework for unobservable binary
events to search in videos, called PR-FUBE. The framework explicitly
models the probability of relevance of a video shot through
the presence and absence of concepts. From our framework, we
derive a ranking formula and show its relationship to previously
proposed formulas. We evaluate our framework against two other
retrieval approaches using the TRECVID 2005 and 2007 datasets.
Especially using large numbers of concepts for retrieval results in
good performance. We attribute the observed robustness against
the noise introduced by less related concepts to the effective combination
of concept presence and absence in our method. The experiments
show that an accurate estimate for the probability of occurrence
of a particular concept in relevant shots is crucial to obtain
effective retrieval results.