I wrote a position paper about a different approach towards development of information extractors, which I call Sherlock Holmes-style based on his famous quote “when you have eliminated the impossible, whatever remains, however improbable, must be the truth”. The idea is that we fundamentally treat annotations as uncertain. We even start with a “no knowledge”, i.e., “everything is possible” starting point and then interactively add more knowledge, apply the knowledge directly to the annotation state by removing possible annotations and recalculating the probabilities of the remaining ones. For example, “Paris Hilton”, “Paris”, and “Hilton” can all be interpreted as a City, Hotel or Person name. But adding knowledge like “If a phrase is interpreted as a Person Name, then its subphrases should not be interpreted as a City” makes the annotations <"Paris Hilton":Person Name> and <"Paris":City> mutually exclusive. Observe that initially all annotations were independent, and these two are now dependent. We argue in the paper that the main challenge in this approach lies in efficient storage and conditioning of probabilistic dependencies, because trivial approaches do not work.
Handling Uncertainty in Information Extraction.
Maurice van Keulen, Mena Badieh Habib
This position paper proposes an interactive approach for developing information extractors based on the ontology definition process with knowledge about possible (in)correctness of annotations. We discuss the problem of managing and manipulating probabilistic dependencies.
The paper will be presented at the URSW workshop co-located with ICSW 2011, 23 October 2011, Bonn, Germany [details]