THE INFERENCE NETWORK MODEL
The use of inference networks for multimedia retrieval
MIRROR
WARNING!
This is just draft information. It has no official status! It is also
explicitely excluded from the indexing by robots. These pages are representing
the intuition behind my research program.
If you like or dislike the ideas put forward on this page, I am very
open to feedback.
INFERENCE NETWORKS FOR MULTIMEDIA RETRIEVAL
The main hypothesis underlying this work is that the inference network
retrieval model is suited for handling retrieval in large multimedia
databases.
In the INQUERY
system, it has been shown that this model is good at text retrieval.
This section demonstrates the benefits of the application
of the inference network model to retrieval in multimedia document
repositories with some (hypothetic) examples.
Multiple representations of a document
The inference network model can cope with multiple representations of
documents. This is crucial for content-based retrieval of multimedia
documents. Two examples may clarify this statement:
- Audio retrieval
- Traditional speech recognition needs a predefined vocabulary. For
retrieval of news items from television, this may not be a good idea,
since names of people and places will not occur in the
vocabulary. Using trigram indexing can possibly resolve this problem
[dV96], but will probably perform worse
on normal vocabulary. Using both recognizers gives two different
representations. The inference network retrieval model will take into
account evidence from both representations for answering the query.
- Presentations
- When we want to index material from (classroom) presentations, we can
use the audio of the speaker and video frames of the slides. Optical
character recognition may generate representations of the video frames
and speech recognition can generate representations of the audio
track (we could even use both approaches form the previous
example). Queries about the presentations are answered using combined
evidence from both information sources.
Multiple representations of a query
The "Query By Example" paradigm is very important for multimedia
retrieval processes. One of the first projects to demonstrate such
functionality was the
QBIC prototype
system. The user tells the system what kind of images to search for by giving
an example of a good image.
Needed: image to demonstrate the QBIC retrieval task using the inference
network model
With the inference network model, it is very easy to express the user's
information need by using multiple images. The retrieval model can easily
combine the amount of match with each of the example images to produce a
ranking based on the combined evidence.
Another possibility is the natural combination of searching the captions with
searching the images by the image similarity. The ranking will be calculated
based on both information sources transparently.
Modeling of uncertain evidence
Recognition algortihms are not error free. However, manually indexing
multimedia data is not possible. Therefore, we have to deal with uncertain
evidence if we search through multimedia databases.
In the inference network model, we can model uncertain evidence by adding a
child b to a node a. The likelihood ratio between
P(a=true) and P(a=false) then expresses the amount of belief we
have in concept a. The contribution of the node b to the belief in
the parent will automatically be taken into account during the belief
propagation.
PERFORMANCE ISSUES
Introduction
Approaches using Bayesian inference networks are known for the related
computational problems. However, this is only the case for situation where we
have many uninstantiated variables. Applying the inference network model to
information retrieval, we assume we know the probabilities of usefulness for
all terms. Once we constructed the query network, after linking it to the
document network we only have to propagate the belief one document at a time.
Basically, there is not much difference in required processing with the vector
space model. Using the vector space model, we have to calculate the similarity
between the query vector and the vectors for each document.
Query processing
Per term, the document ids and the weights are stored in inverted
lists. INQUERY also allows proximity operators in the query language. For
processing such queries, the locations of occurence of terms in documents are
added to the inverted lists as well.
Instead of the weights themselves, the term frequency is stored in the
inverted list for the representation concept. This is beneficial in two ways:
- We do not need belief updates when normalization factors like inverse
document frequency change; this also improves compressability of the
inverted lists.
- Changes in the weighting schema do not require reindexing the
collection.
Processing a query works as follows. First, we identify all documents that
contain the terms occuring in the query. Then we calculate the belief value
for each document referring to at least one query concept and then sort the
list of documents according to their ranking.
Query processing can be performed term at a time or document at a
time. The first method produces several large intermediate results that are
needed simultaneously (up to the depth of the query tree). These intermediate
results may not fit in main memory. The latter evaluates the query tree
multiple times but needs less main memory.
Adding a document has to update the normalization factors for the terms that
occur in the document. Since these normalization factors are not duplicated in
the belief lists, this is not much work. Also, the new document has to be
added to the inverted lists for the terms that occur in the document. This can
be implemented fairly efficient using a smart allocation schema for the
inverted lists.
The key problems
First, approximate retrieval cannot apply set reduction techniques like normal
databases. A boolean "and" does not per definition exclude documents
in which one of the terms does not exist, it just ranks these documents lower
in the retrieved list.
Second, the Zipf distribution of terms in natural language puts severe demands
on full text retrieval systems. Whereas 95% of the lists is smaller than 1 kB
(50% of the lists is only 16 bytes long), these inverted lists only contribute
for 5% to the file size. In other words, 95% of the file size comes from only
5% of the terms. Since these terms occur in most documents, almost every query
requires reading the inverted lists of these files.
Third, the relatively rich query language makes optimalization harder for the
inference network model than for the vector space model. However, the vector
space model can hardly combine evidence from different sources. The problem
with the optimalization techniques is caused by the fact that these techniques
are often only valid when we use just one query operator. Instead, we should
process the whole query tree at once.
Performance improvements
Basically, the performance improves when we read less inverted lists or
smaller inverted lists from secondary memory. This can be achieved
using the following techniques:
- Compress the inverted lists;
- A proximity operator only needs proximity information and a belief
operator only needs belief information. Therefore, split the inverted
lists in proximity lists and belief lists;
- Proximity operator works like a real boolean
"and". Therefore, begin processing with the term that occurs
in the smallest number of documents (so sort the lists on inverse
document frequency);
- Allow per document access within inverted lists using tables;
- Store belief lists in decreasing contributing order (relative to
inverse document frequency). Calculate the maximum amount of belief
that can theoretically be achieved and stop processing of documents if
this is smaller than the top N already identified documents.
OTHER ISSUES
We would like to assert the cost of the agents producing the
concepts in the data collection. Some information may not add very much
information. It is hard to assess the benefits of a new information
source. However, it can be noticed that the problem of assessing the cost of
the new features is similar to the feature selection problem in pattern
recognition. Multidimensional scaling is a technique to select the k
most useful features. The distinction is, that we would like to avoid picking
little features from many agents because that requires more space.
more to follow...
Last updated: $Id: infnets.html,v 1.5 1998/03/05 11:19:35 arjen Exp $
Maintained by: arjen@cs.utwente.nl