• Monday, August 22nd, 2011
One of my PhD students, Mena Badieh Habib, and I submitted a paper about improving the effectiveness of named entity extraction (NEE) with what we call “the reinforcement effect” to the MUD workshop of VLDB2011.
Named Entity Extraction and Disambiguation: The Reinforcement Effect.
Mena Badieh Habib, Maurice van Keulen
Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. Although these topics are highly dependent, almost no existing works examine this dependency. It is the aim of this paper to examine the dependency and show how one affects the other, and vice versa. We conducted experiments with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms as a representative example of named entities. We experimented with three approaches for disambiguation with the purpose to infer the country of the holiday home. We examined how the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how filtering out ambiguous names (an activity that depends on the disambiguation process) improves the effectiveness of extraction. Since this, in turn, may improve the effectiveness of disambiguation again, it shows that extraction and disambiguation may reinforce each other.
The paper will be presented at the MUD workshop co-located with VLDB 2011, 29 August 2011, Seattle, USA [details]
• Wednesday, December 22nd, 2010
One of my PhD students, Mena Badieh Habib, submitted a paper with his research plans in the Neogeography project to the PhD workshop of ICDE2011.
Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams
Mena Badieh Habib
Neogeography is the combination of user generated data and experiences with mapping technologies. In this paper we propose a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology.
The paper will be presented at the PhD workshop co-located with ICDE 2010, 11 April 2011, Hannover, Germany [details]
Mena Badieh Habib started his PhD research in the Neogeography-project today. For details, see my earlier post on “Kick-Off of Neogeography project”.
• Friday, March 12th, 2010
To improve the integration of the new faculty ITC (Geo-Information Science and Earth Observation) into the university, the boards of directors of ITC and UT decided some time ago to subsidize several cooperation projects with each two PhD students, one at ITC and one at the UT. I am involved in one: “Neogeography: the challenge of channelling large and ill-behaved data streams” (see description below). Rolf de By (ITC) and I presented our Neogeography project on the Kick-off meeting 12 March 2010 [presentation]. Rolf’s PhD student is Clarisse Kagoyire and she arrived in The Netherlands just in time to make it to the meeting. My PhD student is Mena Badieh Habib; he will start 1 May 2010.
Neogeography: the challenge of channelling large and ill-behaved data streams
In this project, we develop XML-based data technology to support the channeling of large and ill-behaved neogeographic data streams. In neogeography, geographic information is derived from end-users, not from official bodies like mapping agencies, cadasters or other official, (para-)governmental organizations. The motivation is that multiple (neo)geographic information sources on the same phenomenon can be mutually enriching.
Content provision and feedback from large communities of end-users has great potential for sustaining a high level of data quality. The technology is meant to reach a substantial user community in the less-developed world through content provision and delivery via cell phone networks. Exploiting such neogeographic data requires a.o. the extraction of the where and when from textual descriptions. This comes with intrinsic uncertainty in space, time, but also thematically in terms of entity identification: which is the restaurant, bus stop, farm, market, forest mentioned in this information source? The rise of sensor networks adds to the mix a badly needed verification mechanism for the real-time neogeographic data.
We strive for a proper mix of carefully integrated techniques in geoinformation handling, approaches to spatiotemporal imprecision and incompleteness, as well as data augmentation through sensors in a generic framework with which purpose- oriented end-user communities can be served appropriately.
The UT PhD position focuses on spatiotemporal data technology in XML databases and theory and support technology for storage, manipulation and reasoning with spatiotemporal and thematic uncertainty. The work is to be validated through testbed use cases, such as the H20 project with google.org (water consumers in Zanzibar), AGCommons project with the Gates Foundation (smallholder farmers in sub-Saharan Africa), and other projects with large user communities.
• Monday, September 07th, 2009
In cooperation with ITC (International Institute for Geo-Information Science and Earth Observation), we have a PhD position availble on Neogeography: the challenge of channeling large and ill-behaved data streams. In neogeography, geographic information is derived from end-users, not from official bodies. The technology is meant to reach a substantial user community in the less-developed world through content provision and delivery via cell phone networks. Exploiting such neogeographic data requires a.o. the extraction of the where and when from textual descriptions. This comes with intrinsic uncertainty in space, time, but also thematically in terms of entity identification: which is the restaurant, bus stop, farm, market, forest mentioned in this information source? Anyone with a MSc degree interested in doing PhD research on this topic is welcome to apply before October 10 (see the vacancy for details).