Together with my PhD student Mena Badieh Habib and another PhD student of our group Zhemin Zhu, we participated in the “Making Sense of Microposts” challenge at the WWW 2013 conference … and we won the best IE award!
[paper | presentation | poster]
Tag-Archive for ◊ twitter ◊
Thales managed to find partial funding for the TEC4SE-project (pronounce “tecforce”). The project is about developing an innovative control room for police and fire fighters. I’m going to develop a Twitter analysis component for this together with company ENAI.
One of my PhD students, Mena Badieh Habib, got a paper accepted at the Semantic Web and Information Extraction (SWAIE) workshop at the EKAW conference on improving NEE in twitter.
Unsupervised Improvement of Named Entity Extraction in Short Informal Context Using Disambiguation Clues
Mena Badieh Habib, Maurice van Keulen
Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques which are expensive in terms of data collection and training.
In this paper we present an unsupervised Semantic Web-driven approach to improve the extraction process by using clues from the disambiguation process. For extraction we used a simple Knowledge-Base matching technique combined with a clustering-based approach for disambiguation. Experimental results on a self-collected set of tweets (as an example of short context messages) show improvement in extraction results when using unsupervised feedback from the disambiguation process.
The paper will be presented at the EKAW workshop co-located with SWAIE 2012, 8-12 October 2012, Galway City, Ireland [details]
On Wednesday 14 July 2010, Tom Palsma defended his MSc thesis entitled “Discovering groups using short messages from social network profiles”. The MSc project was carried out at Topicus FinCare. It was supervised by me, Dolf Trieschnigg (UT), Jasper Laagland (FinCare), and Wouter de Jong (FinCare).
Discovering groups using short messages from social network profiles[download]
In the past few years people used the internet more and more for sharing their photos, thoughts and activities via photo albums, user profiles, blogs and short text messages on online social networking sites, like Facebook and MySpace. This information could be very useful for (personal) marketing and advertising. Besides groups formed by the users of the social network sites explicitly, people could have a relation based on similar interests that they implicitly leave in short text messages.
This research has a focus on discovering groups, taking into account semantic relations between user profiles and describing the characteristics of the groups and relations. Because it is hard to discover these types of relations at word level by matching similar words in messages, we introduce a hierarchical structure of concepts obtained from the Wikipedia category system to discover groups and (semantic) relations at more abstract conceptual levels between profiles with short text messages obtained from Twitter.
In order to provide a general approach that is not limited to a specific set of concepts we use a naive classification approach. Concepts (and their parent concepts) are assigned to profiles when concept (related) terms occur in a short message of the profile. Manual evaluation of this approach shows 37.4% of the assignments is correct (the precision), which results in an F-score of 0.54. To improve the precision of the classification results we use Support Vector Machines. Using features related to characteristics of the concept structure and the relations between concepts and profiles improves the classification results with 14% according to the F-score (0.68).
The grouping process consists of clustering of statistical data of concept occurrences in user profiles. Interesting groups discovered based on the clustering results are groups of concepts that are not grouped together in the original Wikipedia category structure. Besides these types of groups the results also show groups of concepts that have a semantic relation, which is not reflected in the Wikipedia category structure. This information could be used to improve the Wikipedia category structure.
The overall process shows that the usage of hierarchical concepts and clustering helps to discover groups based on semantic relations on abstract conceptual levels. The selection of concepts and assigning them to user profiles could guide the grouping results to desired domains and the concepts help to describe the groups. However, due to problems with ambiguous meaning of concepts and characteristics of the messages, another approach of assigning concepts could improve the quality of the discovered groups. To know how useful the groups are for marketing and advertising requires more research.
