• Wednesday, November 25th, 2009
As a product of my cooperation with Fabian Panse from the University of Hamburg, we got a paper accepted at the NTII-workshop co-located with ICDE 2010.
Duplicate Detection in Probabilistic Data
Fabian Panse, Maurice van Keulen, Ander de Keijzer, Norbert Ritter
Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities.
The paper will be presented at the Second International Workshop on New Trends in Information Integration (NTII 2010), Long Beach, California, USA [details]
• Thursday, November 05th, 2009
On Thursday 5 November 2009, Tjitze Rienstra defended his MSc thesis “Dealing with uncertainty in the semantic web”. The MSc project was supervised by me, Paul van der Vet, and Maarten Fokkinga. The work was evaluated by the committee as excellent and received the rarely awarded grade of 10.
“Dealing with uncertainty in the semantic web” [download]
Standardizing the Semantic Web is still an ongoing process. For some aspects, the standardization seems to have completed. For example, the syntax layer, the RDF data model layer and the RDFS and OWL semantic extensions have proven to fulfill their purpose in real world applications. Other aspects, while necessary to realize the greater ideal of the Semantic Web, are yet to be standardized. One of these is dealing with uncertainty. Like classical logic, the languages of the Semantic Web (RDF, RDFS and OWL) work under the assumption that knowledge is certain. Many forms of knowledge, e.g. in computer vision, computational linguistics and information retrieval, exhibit notions of uncertainty. Uncertainty also arises as a side effect of knowledge integration and ontology mapping. This thesis describes an extension for the Semantic Web to deal with uncertainty. The extension, called URDF (Uncertain RDF), extends RDF with the capability to express uncertainty by allowing to associate RDF formulas with probabilities. It not only extends RDF, but also supports the semantics of RDFS and part of OWL. The main contribution is an extension that adheres to the incremental design of the Semantic Web language stack. It can act as a unifying framework for different kinds of probabilistic representation and reasoning, at different levels of expressivity (RDF, RDFS or OWL). In this thesis, we focus on two kinds of reasoning: rule based reasoning with RDFS/OWL knowledge and Bayesian networks and inference.