Archive for the Category » Neogeography «

Thursday, May 15th, 2014 | Author:, and picked up the UT homepage news item on the research of my PhD student Mena Badieh Habib on Named Entity Extraction and Named Entity Disambiguation. UT laat politiecomputers tweets ‘begrijpen’ voor veiligheid bij evenementen Universiteit Twente laat computers beter begrijpend lezen Twentse computer leest beter

Wednesday, May 14th, 2014 | Author:

The news feed of the UT homepage features an item on the research of my PhD student Mena Badieh Habib.
Computers leren beter begrijpend lezen dankzij UT-onderzoek (in Dutch).
Mena defended his PhD thesis entitled “Named Entity Extraction and Disambiguation for Informal Text – The Missing Links on May 9th.

Friday, May 09th, 2014 | Author:

Today, a PhD student of mine, Mena Badieh Habib Morgan, defended his thesis.
Named Entity Extraction and Disambiguation for Informal Text – The Missing Link
Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. When we move to informal language widely used in social media, the language becomes even more ambiguous and thus more challenging for automatic understanding. Named Entity Extraction (NEE) is a sub task of Information Extraction (IE) that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention. The main goal of this thesis is to mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against shortness in labeled training data and against the informality of the used language.

Monday, April 07th, 2014 | Author:

Last year we won the #Microposts2013 challenge; this year we came in second for the new #Microposts2014 challenge called NEEL, “Named Entity Extraction and Linking”, that as opposed to last year also involves Entity Disambiguation (by linking to DBpedia).
Named Entity Extraction and Linking Challenge: University of Twente at #Microposts2014 [Download]
Mena Badieh Habib, Maurice van Keulen, Zhemin Zhu

Monday, May 13th, 2013 | Author:

Together with my PhD student Mena Badieh Habib and another PhD student of our group Zhemin Zhu, we participated in the “Making Sense of Microposts” challenge at the WWW 2013 conference … and we won the best IE award!
[paper | presentation | poster]

Tuesday, October 09th, 2012 | Author:

One of my PhD students, Mena Habib, has won the Best Student Paper Award at the 4th International Conference on Knowledge Discovery and Information Retrieval (KDIR 2012) in Barcelona, Spain, for our paper ”Improving Toponym Disambiguation by Iteratively Enhancing Certainty of Extraction”.

Category: Information Extraction, Neogeography  | Comments off
Tuesday, September 04th, 2012 | Author:

One of my PhD students, Mena Badieh Habib, got a paper accepted at the Semantic Web and Information Extraction (SWAIE) workshop at the EKAW conference on improving NEE in twitter.
Unsupervised Improvement of Named Entity Extraction in Short Informal Context Using Disambiguation Clues
Mena Badieh Habib, Maurice van Keulen
Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques which are expensive in terms of data collection and training.
In this paper we present an unsupervised Semantic Web-driven approach to improve the extraction process by using clues from the disambiguation process. For extraction we used a simple Knowledge-Base matching technique combined with a clustering-based approach for disambiguation. Experimental results on a self-collected set of tweets (as an example of short context messages) show improvement in extraction results when using unsupervised feedback from the disambiguation process.
The paper will be presented at the EKAW workshop co-located with SWAIE 2012, 8-12 October 2012, Galway City, Ireland [details]

Category: Information Extraction, Neogeography  | Tags: ,  | Comments off
Tuesday, July 24th, 2012 | Author:

One of my PhD students, Mena Badieh Habib, got a paper accepted at the Knowledge Discovery and Information Retrieval (KDIR) conference on improving NEE and NED by treating them as processes that can reinforce each other.
Improving Toponym Disambiguation by Iteratively Enhancing Certainty of Extraction
Mena Badieh Habib, Maurice van Keulen
Named entity extraction (NEE) and disambiguation (NED) have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. This paper addresses two problems with toponym extraction and disambiguation (as a representative example of named entities). First, almost no existing works examine the extraction and disambiguation interdependency. Second, existing disambiguation techniques mostly take as input extracted named entities without considering the uncertainty and imperfection of the extraction process.
It is the aim of this paper to investigate both avenues and to show that explicit handling of the uncertainty of annotation has much potential for making both extraction and disambiguation more robust. We conducted experiments with a set of holiday home descriptions with the aim to extract and disambiguate toponyms. We show that the extraction confidence probabilities are useful in enhancing the effectiveness of disambiguation. Reciprocally, retraining the extraction models with information automatically derived from the disambiguation results, improves the extraction models. This mutual reinforcement is shown to even have an effect after several automatic iterations.
The paper will be presented at the KDIR conference, 4-7 October 2012, Barcelona, Spain [details]

Friday, December 02nd, 2011 | Author:

One of my PhD students, Mena Badieh Habib, has given a talk on the Dutch-Belgian DataBase Day (DBDBD) about “Named Entity Extraction and Disambiguation from an Uncertainty Perspective“.

Monday, September 05th, 2011 | Author:

I wrote a position paper about a different approach towards development of information extractors, which I call Sherlock Holmes-style based on his famous quote “when you have eliminated the impossible, whatever remains, however improbable, must be the truth”. The idea is that we fundamentally treat annotations as uncertain. We even start with a “no knowledge”, i.e., “everything is possible” starting point and then interactively add more knowledge, apply the knowledge directly to the annotation state by removing possible annotations and recalculating the probabilities of the remaining ones. For example, “Paris Hilton”, “Paris”, and “Hilton” can all be interpreted as a City, Hotel or Person name. But adding knowledge like “If a phrase is interpreted as a Person Name, then its subphrases should not be interpreted as a City” makes the annotations <"Paris Hilton":Person Name> and <"Paris":City> mutually exclusive. Observe that initially all annotations were independent, and these two are now dependent. We argue in the paper that the main challenge in this approach lies in efficient storage and conditioning of probabilistic dependencies, because trivial approaches do not work.
Handling Uncertainty in Information Extraction.
Maurice van Keulen, Mena Badieh Habib
This position paper proposes an interactive approach for developing information extractors based on the ontology definition process with knowledge about possible (in)correctness of annotations. We discuss the problem of managing and manipulating probabilistic dependencies.

The paper will be presented at the URSW workshop co-located with ICSW 2011, 23 October 2011, Bonn, Germany [details]