Tuesday, February 02nd, 2016 | Author:

The project proposal “Time To Care: Using sensor technology to dynamically model social interactions of healthcare professionals at work in relation to healthcare quality” has been accepted in our university’s Tech4People program. The project is a cooperation with Educational Sciences (chair OWK) and Psychology of Conflict, Risk and Safety (chair PCRS) with whom the funded PhD student will be shared.

What I am particulary enthusiastic about in this project is that it is not only interdisciplinary cooperation towards a shared goal, but that also disciplinary research questions from each of the participating disciplines can be answered. For me, it is a unique opportunity to test whether probabilistic modeling of the data quality problems / noise in the social interaction data obtained from the sensors indeed provide significantly different results when predicting team performance.

Thursday, January 28th, 2016 | Author:

My PhD student Mohammad Khelgathi released his web harvesting software, called HarvestED.

Friday, January 22nd, 2016 | Author:

I’ve been interview by NRCQ about natural language processing, in particular, about computers learning to understand the more subtle aspects of language use such as sarcasm
Súperhandig hoor, computers die sarcasme kunnen herkennen

Thursday, January 14th, 2016 | Author:

Today I gave a presentation at the Data Science Northeast Netherlands Meetup about
Managing uncertainty in data: the key to effective management of data quality problems [slides (PDF)]

Business analytics and data science are significantly impaired by a wide variety of ‘data handling’ issues, especially when data from different sources are combined and when unstructured data is involved. The root cause of many such problems centers around data semantics and data quality. We have developed a generic method which is based on modeling such problems as uncertainty *in* the data. A recently conceived new kind of DBMS can store, manage, and query large volumes of uncertain data: the UDBMS or “Uncertain Database”. Together, they allow one to, e.g., postpone the resolution of data problems, assess what their influence is on analytical results, etc. We furthermore develop technology for data cleansing, web harvesting, and natural language processing which uses this method to deal with ambiguity of natural language and many other problems encountered when using unstructured data.

Wednesday, October 14th, 2015 | Author:

Dolf Trieschnigg and I got some subsidy to valorize some of the research results of the COMMIT/ TimeTrails, PayDIBI, and FedSS projects. Company involved is Mydatafactory.
SmartCOPI: Smart Consolidation of Product Information
[download public version of project proposal]
Maintaining the quality of detailed product data, ranging from data about required raw materials to detailed specifications of tools and spare parts, is of vital importance in many industries. Ordering or using wrong spare parts (based on wrong or incomplete information) may result in significant production loss or even impact health and safety. The web provides a wealth of information on products provided in various formats, detail levels, targeted at at a variety of audiences. Semi- automatically locating, extracting and consolidating this information would be a “killer app” for enriching and improving product data quality with a significant impact on production cost and quality. The new to COMMIT/ industry partner Mydatafactory is interested in both the web harvesting and data cleansing technologies developed in COMMIT/-projects P1/Infiniti and P19/TimeTrails for this potential and for improving Mydatafactory’s data cleansing services. The ICT science questions behind data cleansing and web harvesting are how noise can be detected and reduced in discrete structured data, and how human cognitive skills in information navigation and extraction can be mimicked. Research results on these questions may benefit a wide range of applications from various domains such as fraud detection and forensics, creating a common operational picture, and safety in food and pharmaceuticals.

Tuesday, March 03rd, 2015 | Author:

I have been nominated for the “decentrale onderwijsprijs“, a teaching award issued by the computer science student association Inter-Actief. Part of the election process is that each nominee gives a short (10-15 min) mini-lecture. Mine was about “Onzekere databases” (Uncertain databases; In Dutch). Announcement of the winner is March 9th, 2015.

Category: 5. Teaching  | Tags: , ,  | Comments off
Wednesday, February 25th, 2015 | Author:

Today I gave a presentation on the SIKS Smart Auditing workshop at the University of Tilburg.

Thursday, May 15th, 2014 | Author:

Tweakers.net, NU.nl and Kennislink.nl picked up the UT homepage news item on the research of my PhD student Mena Badieh Habib on Named Entity Extraction and Named Entity Disambiguation.
Tweakers.net: UT laat politiecomputers tweets ‘begrijpen’ voor veiligheid bij evenementen
NU.nl: Universiteit Twente laat computers beter begrijpend lezen
Kennislink.nl: Twentse computer leest beter

Wednesday, May 14th, 2014 | Author:

The news feed of the UT homepage features an item on the research of my PhD student Mena Badieh Habib.
Computers leren beter begrijpend lezen dankzij UT-onderzoek (in Dutch).
Mena defended his PhD thesis entitled “Named Entity Extraction and Disambiguation for Informal Text – The Missing Links on May 9th.

Friday, May 09th, 2014 | Author:

Today, a PhD student of mine, Mena Badieh Habib Morgan, defended his thesis.
Named Entity Extraction and Disambiguation for Informal Text – The Missing Link
Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. When we move to informal language widely used in social media, the language becomes even more ambiguous and thus more challenging for automatic understanding. Named Entity Extraction (NEE) is a sub task of Information Extraction (IE) that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention. The main goal of this thesis is to mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against shortness in labeled training data and against the informality of the used language.