• Monday, May 13th, 2013

Together with my PhD student Mena Badieh Habib and another PhD student of our group Zhemin Zhu, we participated in the “Making Sense of Microposts” challenge at the WWW 2013 conference … and we won the best IE award!
[paper | presentation | poster]

• Wednesday, May 08th, 2013

Following New Scientist, also WebWereld features an article about my identity extraction work together with Fox IT: “Politiesoftware filtert slim identiteiten uit digibewijs” (Dutch).

• Friday, May 03rd, 2013

The popular science magazine New Scientist features a small article on one of my “Crime Science” endeavors with Hans Henseler and Jop Hofsté from the company Fox-IT: Fast digital forensics sniff out accomplices (also appeared in Mafia Today). It is based on the MSc-project work of Jop Hofsté which will be demonstrated at ICAIL 2013.

• Tuesday, March 19th, 2013

I’ve been invited onto the Executive Board of EDBT, the organization behind the EDBT conference and EDBT summerschool series.

Category: Organization  | Tags: ,  | Comments off
• Tuesday, March 12th, 2013

The University of Twente is currently completely redesigning all its bachelor studies. I am the coordinator of the first module for the study “Technische Informatica”. Today, the new module structure has been announced publicly for the first time including ‘my’ module “Inleiding Informatica”. We are by the way thinking about a new name for the module … will be continued. [Announcement (Dutch)]

Category: Inleiding Technische Informatica  | Tags: ,  | Comments off
• Thursday, February 28th, 2013

The University of Twente is currently completely redesigning all its bachelor studies. As coordinator of the design of the first module for the study “Technische Informatica”, I today presented the design in a feedback meeting. [Presentation]

Category: Inleiding Technische Informatica  | Tags: ,  | Comments off
• Wednesday, February 27th, 2013

Tomorrow, 28 Feb 2013, a PhD student of mine, Victor de Graaff, is going to give a presentation on how to estimate the boundaries for objects for which you only have a point and other public data such as Open Street Map [Announcement].
Point of Interest to Region of Interest Conversion
Date/Time: Thursday, February 28, 2013 – 13:30 to 14:30; Room: 0-142
GPS trajectories from a mobile device, such as a smartphone, indirectly contain a vast amount of information on the interests of the owner of the device. Collections of GPS trajectories even provide insight in the popularity of locations, and the time spent at those locations. To obtain this information, the visited places on such a trajectory need to be recognized. However, the location information on a point of interest (POI) in a database is normally limited to an address and a GPS coordinate, rather than a geometry describing its boundaries. To create a match with a GPS trajectory, a two-dimensional shape representing this place, a region of interest (ROI), is needed. In the absence of expensive and hard to obtain detailed spatial data like cadastral data, we need to estimate this ROI. In this research project, we bridge this gap by presenting several approaches to estimate the size and shape of the ROI, and validate these estimations against the cadastral data of the city of Enschede, The Netherlands.

• Thursday, December 20th, 2012

On 20 December 2012, Jasper Stoop defended his MSc thesis on process mining for fraud detection in the procurement process. The MSc project was carried out at KPMG.
“Process Mining and Fraud Detection: A case study on the theoretical and practical value of using process mining for the detection of fraudulent behavior in the procurement process”[download]
This thesis presents the results of a six month research period on process mining and fraud detection. This thesis aimed to answer the research question as to how process mining can be utilized in fraud detection and what the benefits of using process mining for fraud detection are. Based on a literature study it provides a discussion of the theory and application of process mining and its various aspects and techniques. Using both a literature study and an interview with a domain expert, the concepts of fraud and fraud detection are discussed. These results are combined with an analysis of existing case studies on the application of process mining and fraud detection to construct an initial setup of two case studies, in which process mining is applied to detect possible fraudulent behavior in the procurement process. Based on the experiences and results of these case studies, the 1+5+1 methodology is presented as a first step towards operationalizing principles with advice on how process mining techniques can be used in practice when trying to detect fraud. This thesis presents three conclusions: (1) process mining is a valuable addition to fraud detection, (2) using the 1+5+1 concept it was possible to detect indicators of possibly fraudulent behavior (3) the practical use of process mining for fraud detection is diminished by the poor performance of the current tools. The techniques and tools that do not suffer from performance issues are an addition, rather than a replacement, to regular data analysis techniques by providing either new, quicker, or more easily obtainable insights into the process and possible fraudulent behavior.

• Friday, December 07th, 2012

On 7 December 2012, Paul Stapersma defended his MSc thesis “Efficient Query Evaluation on Probabilistic XML Data”. The MSc project was supervised by me, Maarten Fokkinga and Jan Flokstra. The thesis is the result of a more than 2 year cooperation between Paul and me to build a probabilistic XML database system on top of a relational one: MayBMS.
“Efficient Query Evaluation on Probabilistic XML Data”[download]
In many application scenarios, reliability and accuracy of data are of great importance. Data is often uncertain or inconsistent because the exact state of represented real world objects is unknown. A number of uncertain data models have emerged to cope with imperfect data in order to guarantee a level of reliability and accuracy. These models include probabilistic XML (P-XML) –an uncertain semi-structured data model– and U-Rel –an uncertain table-structured data model. U-Rel is used by MayBMS, an uncertain relational database management system (URDBMS) that provides scalable query evaluation. In contrast to U-Rel, there does not exist an efficient query evaluation mechanism for P-XML.
In this thesis, we approach this problem by instructing MayBMS to cope with P-XML in order to evaluate XPath queries on P-XML data as SQL queries on uncertain relational data. This approach entails two aspects: (1) a data mapping from P-XML to U-Rel that ensures that the same information is represented by database instances of both data structures, and (2) a query mapping from XPath to SQL that ensures that the same question is specified in both query languages.
We present a specification of a P-XML to U-Rel data mapping and a corresponding XPath to SQL mapping. Additionally, we present two designs of this specification. The first design constructs a data mapping in such way that the corresponding query mapping is a traditional XPath to SQL mapping. The second design differs from the first in the sense that a component of the data mapping is evaluated as part of the query evaluation process. This offers the advantage that the data mapping is more efficient. Additionally, the second design allows for a number of optimizations that affect the performance of the query evaluation process. However, this process is burdened with the extra task of evaluating the data mapping component.
An extensive experimental evaluation on synthetically generated data sets and real-world data sets shows that our implementation of the second design is more efficient in most scenarios. Not only is the P-XML data mapping executed more efficient, the query evaluation performance is also improved in most scenarios.

• Wednesday, November 21st, 2012

Two of my PhD students, Mohammad Khelgati and Victor de Graaff, are presenting on the Dutch-Belgian DataBase Day (DBDBD). Mohammad about “Size Estimation of Non-Cooperative Data Collections” and Victor on “Semantic Enrichment of GPS Trajectories“.

Category: COMMIT, Paper abstracts, Research  | Tags:  | Leave a Comment