THE INFERENCE NETWORK MODEL

The use of inference networks for multimedia retrieval

MIRROR

WARNING!

This is just draft information. It has no official status! It is also explicitely excluded from the indexing by robots. These pages are representing the intuition behind my research program.

If you like or dislike the ideas put forward on this page, I am very open to feedback.


INFERENCE NETWORKS FOR MULTIMEDIA RETRIEVAL

The main hypothesis underlying this work is that the inference network retrieval model is suited for handling retrieval in large multimedia databases. In the INQUERY system, it has been shown that this model is good at text retrieval. This section demonstrates the benefits of the application of the inference network model to retrieval in multimedia document repositories with some (hypothetic) examples.

Multiple representations of a document

The inference network model can cope with multiple representations of documents. This is crucial for content-based retrieval of multimedia documents. Two examples may clarify this statement:
Audio retrieval
Traditional speech recognition needs a predefined vocabulary. For retrieval of news items from television, this may not be a good idea, since names of people and places will not occur in the vocabulary. Using trigram indexing can possibly resolve this problem [dV96], but will probably perform worse on normal vocabulary. Using both recognizers gives two different representations. The inference network retrieval model will take into account evidence from both representations for answering the query.
Presentations
When we want to index material from (classroom) presentations, we can use the audio of the speaker and video frames of the slides. Optical character recognition may generate representations of the video frames and speech recognition can generate representations of the audio track (we could even use both approaches form the previous example). Queries about the presentations are answered using combined evidence from both information sources.

Multiple representations of a query

The "Query By Example" paradigm is very important for multimedia retrieval processes. One of the first projects to demonstrate such functionality was the QBIC prototype system. The user tells the system what kind of images to search for by giving an example of a good image.

Needed: image to demonstrate the QBIC retrieval task using the inference network model

With the inference network model, it is very easy to express the user's information need by using multiple images. The retrieval model can easily combine the amount of match with each of the example images to produce a ranking based on the combined evidence.

Another possibility is the natural combination of searching the captions with searching the images by the image similarity. The ranking will be calculated based on both information sources transparently.

Modeling of uncertain evidence

Recognition algortihms are not error free. However, manually indexing multimedia data is not possible. Therefore, we have to deal with uncertain evidence if we search through multimedia databases.

In the inference network model, we can model uncertain evidence by adding a child b to a node a. The likelihood ratio between P(a=true) and P(a=false) then expresses the amount of belief we have in concept a. The contribution of the node b to the belief in the parent will automatically be taken into account during the belief propagation.


PERFORMANCE ISSUES

Introduction

Approaches using Bayesian inference networks are known for the related computational problems. However, this is only the case for situation where we have many uninstantiated variables. Applying the inference network model to information retrieval, we assume we know the probabilities of usefulness for all terms. Once we constructed the query network, after linking it to the document network we only have to propagate the belief one document at a time.

Basically, there is not much difference in required processing with the vector space model. Using the vector space model, we have to calculate the similarity between the query vector and the vectors for each document.

Query processing

Per term, the document ids and the weights are stored in inverted lists. INQUERY also allows proximity operators in the query language. For processing such queries, the locations of occurence of terms in documents are added to the inverted lists as well.

Instead of the weights themselves, the term frequency is stored in the inverted list for the representation concept. This is beneficial in two ways:

Processing a query works as follows. First, we identify all documents that contain the terms occuring in the query. Then we calculate the belief value for each document referring to at least one query concept and then sort the list of documents according to their ranking.

Query processing can be performed term at a time or document at a time. The first method produces several large intermediate results that are needed simultaneously (up to the depth of the query tree). These intermediate results may not fit in main memory. The latter evaluates the query tree multiple times but needs less main memory.

Adding a document has to update the normalization factors for the terms that occur in the document. Since these normalization factors are not duplicated in the belief lists, this is not much work. Also, the new document has to be added to the inverted lists for the terms that occur in the document. This can be implemented fairly efficient using a smart allocation schema for the inverted lists.

The key problems

First, approximate retrieval cannot apply set reduction techniques like normal databases. A boolean "and" does not per definition exclude documents in which one of the terms does not exist, it just ranks these documents lower in the retrieved list.

Second, the Zipf distribution of terms in natural language puts severe demands on full text retrieval systems. Whereas 95% of the lists is smaller than 1 kB (50% of the lists is only 16 bytes long), these inverted lists only contribute for 5% to the file size. In other words, 95% of the file size comes from only 5% of the terms. Since these terms occur in most documents, almost every query requires reading the inverted lists of these files.

Third, the relatively rich query language makes optimalization harder for the inference network model than for the vector space model. However, the vector space model can hardly combine evidence from different sources. The problem with the optimalization techniques is caused by the fact that these techniques are often only valid when we use just one query operator. Instead, we should process the whole query tree at once.

Performance improvements

Basically, the performance improves when we read less inverted lists or smaller inverted lists from secondary memory. This can be achieved using the following techniques:

OTHER ISSUES

We would like to assert the cost of the agents producing the concepts in the data collection. Some information may not add very much information. It is hard to assess the benefits of a new information source. However, it can be noticed that the problem of assessing the cost of the new features is similar to the feature selection problem in pattern recognition. Multidimensional scaling is a technique to select the k most useful features. The distinction is, that we would like to avoid picking little features from many agents because that requires more space.

more to follow...


Last updated: $Id: infnets.html,v 1.5 1998/03/05 11:19:35 arjen Exp $
Maintained by: arjen@cs.utwente.nl