Tag-Archive for » data science «

Thursday, September 07th, 2017 | Author:

Loved to talk about “Data Quality” at the Second Data Science MeetUp Twente on Responsible Data Analytics. Thanks to Susan Kamies of MoneyBird and Sven van Munster of Green Orange for organizing the event.
[Slides on Slideshare] [2nd Data Science MeetUp Twente]

Thursday, January 14th, 2016 | Author:

Today I gave a presentation at the Data Science Northeast Netherlands Meetup about
Managing uncertainty in data: the key to effective management of data quality problems [slides (PDF)]

Business analytics and data science are significantly impaired by a wide variety of ‘data handling’ issues, especially when data from different sources are combined and when unstructured data is involved. The root cause of many such problems centers around data semantics and data quality. We have developed a generic method which is based on modeling such problems as uncertainty *in* the data. A recently conceived new kind of DBMS can store, manage, and query large volumes of uncertain data: the UDBMS or “Uncertain Database”. Together, they allow one to, e.g., postpone the resolution of data problems, assess what their influence is on analytical results, etc. We furthermore develop technology for data cleansing, web harvesting, and natural language processing which uses this method to deal with ambiguity of natural language and many other problems encountered when using unstructured data.