• Monday, September 05th, 2011
I wrote a position paper about a different approach towards development of information extractors, which I call Sherlock Holmes-style based on his famous quote “when you have eliminated the impossible, whatever remains, however improbable, must be the truth”. The idea is that we fundamentally treat annotations as uncertain. We even start with a “no knowledge”, i.e., “everything is possible” starting point and then interactively add more knowledge, apply the knowledge directly to the annotation state by removing possible annotations and recalculating the probabilities of the remaining ones. For example, “Paris Hilton”, “Paris”, and “Hilton” can all be interpreted as a City, Hotel or Person name. But adding knowledge like “If a phrase is interpreted as a Person Name, then its subphrases should not be interpreted as a City” makes the annotations <"Paris Hilton":Person Name> and <"Paris":City> mutually exclusive. Observe that initially all annotations were independent, and these two are now dependent. We argue in the paper that the main challenge in this approach lies in efficient storage and conditioning of probabilistic dependencies, because trivial approaches do not work.
Handling Uncertainty in Information Extraction.
Maurice van Keulen, Mena Badieh Habib
This position paper proposes an interactive approach for developing information extractors based on the ontology definition process with knowledge about possible (in)correctness of annotations. We discuss the problem of managing and manipulating probabilistic dependencies.
The paper will be presented at the URSW workshop co-located with ICSW 2011, 23 October 2011, Bonn, Germany [details]
• Wednesday, September 01st, 2010
Gezocht met spoed: student voor onderstaande afstudeeropdracht tbv het ESCAPE project.
ESCAPE is een project tbv een nieuwe manier van wetenschappelijke communicatie die niet meer gebaseerd is op alleen maar artikelen. Het is gebaseerd op semantic web technologie waarmee brede kennis over artikelen, data, resultaten, onderzoekers, projecten, organisaties, en de relaties daartussen kunnen worden opgeslagen, bevraagd en gemanipuleerd. Het invoeren van de gegevens en kennis is echter nogal arbeidsintensief. Deze opdracht gaat erover om tools te ontwikkelen voor automatische verrijking van de gegevens en kennis. Daarmee bedoelen we op ‘t laagste niveau import van publicatiegegevens van websites van uitgevers e.d., maar ook op een hoger niveau verrijking door automatisch links te leggen met Open Linked Data en andere databases en websites.
• Thursday, November 05th, 2009
On Thursday 5 November 2009, Tjitze Rienstra defended his MSc thesis “Dealing with uncertainty in the semantic web”. The MSc project was supervised by me, Paul van der Vet, and Maarten Fokkinga. The work was evaluated by the committee as excellent and received the rarely awarded grade of 10.
“Dealing with uncertainty in the semantic web” [download]
Standardizing the Semantic Web is still an ongoing process. For some aspects, the standardization seems to have completed. For example, the syntax layer, the RDF data model layer and the RDFS and OWL semantic extensions have proven to fulfill their purpose in real world applications. Other aspects, while necessary to realize the greater ideal of the Semantic Web, are yet to be standardized. One of these is dealing with uncertainty. Like classical logic, the languages of the Semantic Web (RDF, RDFS and OWL) work under the assumption that knowledge is certain. Many forms of knowledge, e.g. in computer vision, computational linguistics and information retrieval, exhibit notions of uncertainty. Uncertainty also arises as a side effect of knowledge integration and ontology mapping. This thesis describes an extension for the Semantic Web to deal with uncertainty. The extension, called URDF (Uncertain RDF), extends RDF with the capability to express uncertainty by allowing to associate RDF formulas with probabilities. It not only extends RDF, but also supports the semantics of RDFS and part of OWL. The main contribution is an extension that adheres to the incremental design of the Semantic Web language stack. It can act as a unifying framework for different kinds of probabilistic representation and reasoning, at different levels of expressivity (RDF, RDFS or OWL). In this thesis, we focus on two kinds of reasoning: rule based reasoning with RDFS/OWL knowledge and Bayesian networks and inference.
• Friday, February 06th, 2009
The project description for Tjitze Rienstra’s Msc project has been finalized. The project is being supervised by me and Paul van der Vet.
Dealing with uncertainty in the Semantic Web
The notion of data integration is essential to the Semantic Web. Its real advantage is that it enables us to gather data from different sources, reason over this data and get results that may otherwise not have been easy to find. However, data integration can lead to conflicts. Different sources may provide contradicting information about the same real world objects. The result is uncertainty. The technologies of the Semantic Web are assertional, which means that they cannot deal with uncertainty very well.
The essential standards (RDF, RDFS, OWL, SPARQL) will be extended in order to deal with uncertainty. We will first make clear what is required in terms of expressiveness. We then specify an extension by formalizing a ‘possible world’ semantics for RDF. It will be necessary to consider what the consequences are for RDFS and OWL. Finally, querying with SPARQL must be adapted to work with this possible world model, while at the same time be computationally efficient. Validation will be done by testing a prototype against a movie database, containing conflicting data from different sources.