On Friday 22 January 2010, Michiel Punter defended his MSc thesis “Multi-Source Entity Resolution“. The MSc project was supervised by me, Ander de Keijzer, and Riham Abdel Kader.
“Multi-Source Entity Resolution” [download]
Background: The focus of this research was on multi-source entity resolution in the setting of pair-wise data integration. In contrast to most existing approaches to entity resolution this research does not consider matching to be transitive. A consequence of this is that entity resolution on multiple sources is not guaranteed to be associative. The goal of this research was to construct a generic model for multi-source entity resolution in the setting of pair-wise data integration that is associative.
Results: The main contributions of this research are: (1) a formal model for multi-source entity resolution and (2) strategies that can be used to resolve matching conflicts in a way that renders multi-source entity resolution to be associative. The possible worlds semantics is used to handle uncertainty originating from possible matches. The presented model is generic enough to allow different match and merge function as well as allowing different strategies to resolve matching conflicts.
Conclusions: A formalization of an example of multi-source entity resolution is presented to show the utility of the proposed model. By using small examples in which three sources are integrated it is shown that the strategies resulted in associative behavior of the integrate function.