Following New Scientist, also WebWereld features an article about my identity extraction work together with Fox IT: “Politiesoftware filtert slim identiteiten uit digibewijs” (Dutch).
Archive for the Category ◊ Teaching ◊
The popular science magazine New Scientist features a small article on one of my “Crime Science” endeavors with Hans Henseler and Jop Hofsté from the company Fox-IT: Fast digital forensics sniff out accomplices (also appeared in Mafia Today). It is based on the MSc-project work of Jop Hofsté which will be demonstrated at ICAIL 2013.
The University of Twente is currently completely redesigning all its bachelor studies. I am the coordinator of the first module for the study “Technische Informatica”. Today, the new module structure has been announced publicly for the first time including ‘my’ module “Inleiding Informatica”. We are by the way thinking about a new name for the module … will be continued. [Announcement (Dutch)]
The University of Twente is currently completely redesigning all its bachelor studies. As coordinator of the design of the first module for the study “Technische Informatica”, I today presented the design in a feedback meeting. [Presentation]
On 20 December 2012, Jasper Stoop defended his MSc thesis on process mining for fraud detection in the procurement process. The MSc project was carried out at KPMG.
“Process Mining and Fraud Detection: A case study on the theoretical and practical value of using process mining for the detection of fraudulent behavior in the procurement process”[download]
This thesis presents the results of a six month research period on process mining and fraud detection. This thesis aimed to answer the research question as to how process mining can be utilized in fraud detection and what the benefits of using process mining for fraud detection are. Based on a literature study it provides a discussion of the theory and application of process mining and its various aspects and techniques. Using both a literature study and an interview with a domain expert, the concepts of fraud and fraud detection are discussed. These results are combined with an analysis of existing case studies on the application of process mining and fraud detection to construct an initial setup of two case studies, in which process mining is applied to detect possible fraudulent behavior in the procurement process. Based on the experiences and results of these case studies, the 1+5+1 methodology is presented as a first step towards operationalizing principles with advice on how process mining techniques can be used in practice when trying to detect fraud. This thesis presents three conclusions: (1) process mining is a valuable addition to fraud detection, (2) using the 1+5+1 concept it was possible to detect indicators of possibly fraudulent behavior (3) the practical use of process mining for fraud detection is diminished by the poor performance of the current tools. The techniques and tools that do not suffer from performance issues are an addition, rather than a replacement, to regular data analysis techniques by providing either new, quicker, or more easily obtainable insights into the process and possible fraudulent behavior.
On 7 December 2012, Paul Stapersma defended his MSc thesis “Efficient Query Evaluation on Probabilistic XML Data”. The MSc project was supervised by me, Maarten Fokkinga and Jan Flokstra. The thesis is the result of a more than 2 year cooperation between Paul and me to build a probabilistic XML database system on top of a relational one: MayBMS.
“Efficient Query Evaluation on Probabilistic XML Data”[download]
In many application scenarios, reliability and accuracy of data are of great importance. Data is often uncertain or inconsistent because the exact state of represented real world objects is unknown. A number of uncertain data models have emerged to cope with imperfect data in order to guarantee a level of reliability and accuracy. These models include probabilistic XML (P-XML) –an uncertain semi-structured data model– and U-Rel –an uncertain table-structured data model. U-Rel is used by MayBMS, an uncertain relational database management system (URDBMS) that provides scalable query evaluation. In contrast to U-Rel, there does not exist an efficient query evaluation mechanism for P-XML.
In this thesis, we approach this problem by instructing MayBMS to cope with P-XML in order to evaluate XPath queries on P-XML data as SQL queries on uncertain relational data. This approach entails two aspects: (1) a data mapping from P-XML to U-Rel that ensures that the same information is represented by database instances of both data structures, and (2) a query mapping from XPath to SQL that ensures that the same question is specified in both query languages.
We present a specification of a P-XML to U-Rel data mapping and a corresponding XPath to SQL mapping. Additionally, we present two designs of this specification. The first design constructs a data mapping in such way that the corresponding query mapping is a traditional XPath to SQL mapping. The second design differs from the first in the sense that a component of the data mapping is evaluated as part of the query evaluation process. This offers the advantage that the data mapping is more efficient. Additionally, the second design allows for a number of optimizations that affect the performance of the query evaluation process. However, this process is burdened with the extra task of evaluating the data mapping component.
An extensive experimental evaluation on synthetically generated data sets and real-world data sets shows that our implementation of the second design is more efficient in most scenarios. Not only is the P-XML data mapping executed more efficient, the query evaluation performance is also improved in most scenarios.
The University of Twente is completely reorganizing all its bachelor studies. We are going to adopt a one-module-per-quartile system with more activating teaching forms. I’ve been asked to coordinate the design and realization of the first 15EC module for the study of Technical Informatics. The team I’ve assembled is comprised of Arend Rensink, Pieter-Tjerk de Boer, Pascal van Eck, and Jan Kamphuis.
A MSc student of mine, Jasper Kuperus, was nominated for the ENIAC thesis award for his thesis “Catching criminals by chance” named entity extraction in digital forensics. Unfortunately, he didn’t win.
On 1 November 2012, Jop Hofste defended his MSc thesis “Scalable identity extraction and ranking in Tracks Inspector”. The MSc project was carried out at Fox-IT.
“Scalable identity extraction and ranking in Tracks Inspector”[download]
The digital forensic world deals with a growing amount of data which should be processed. In general, investigators do not have the time to manually analyze all the digital evidence to get a good picture of the suspect. Most of the time investigations contain multiple evidence units per case. This research shows the extraction and resolution of identities out of evidence data. Investigators are supported in their investigations by proposing the involved identities to them. These identities are extracted from multiple heterogeneous sources like system accounts, emails, documents, address books and communication items. Identity resolution is used to merge identities at case level when multiple evidence units are involved.
The functionality for extracting, resolving and ranking identities is implemented and tested in the forensic tool Tracks Inspector. The implementation in Tracks Inspector is tested on five datasets. The results of this are compared with two other forensic products, Clearwell and Trident, on the extent to which they support the identity functionality. Tracks Inspector delivers very promising results compared to these products, it extracts more or the same number of the relevant identities in their top 10 identities compared to Clearwell and Trident. Tracks Inspector delivers a high accuracy, compared to Clearwell it has a better precision and the recall is approximately equal what results from the tests.
The contribution of this research is to show a method for the extraction and ranking of identities in Tracks Inspector. In the digital forensic world it is a quite new approach, because no other software products support this kind of functionality. Investigations can now start by exploring the most relevant identities in a case. The nodes which are involved in an identity can be quickly recognized. This means that the evidence data can be filtered at an early-stage.
On 9 August 2012, Rudo Denneman defended his MSc thesis on a requirements analysis for business intelligence of CRM processes of municipalities. The MSc project was carried out at Exxellence.
“Management information requirements for customer relationship management in municipalities”[download]
This research project looks into the management information requirements of municipalities in the Netherlands, related to their customer relationship program. Information requirements engineering methodologies for data warehouses are reviewed and a method is proposed based on its perceived suitability for the municipality context. The used methodology by Winter and Strauch matches information requirements elicitation with analyses of the data sources to get an overview of requirements and whether they are attainable. Results are a list of management information requirement, representation requirements and an advice to Exxellence Group on how they can foresee in this demand.
The resulting list of management information requirements seems to indicate that the management of client contact centres would like to see more management information than what it currently prescribed by the Antwoord© concept on which they have based their management information needs for the most part. The list was sent back to municipalities to allow them to comment and rate the information needs on their usefulness. Also, the COPC standard on which the Antwoord© indicators are based and the Antwoord© indicators themselves were compared to the results. The results seem to cover almost all of the COPC metrics except for several process areas that are not as relevant in the municipality context. Also potentially interesting additions to the results that could be made from the COPC standard have been identified. The indicators from the Antwoord© concept score relatively high in the ranking of information needs and are a solid basis for measurements.
Overall, the information needs voiced by municipalities are on an operational level to measure performance of departments and individual employees over time. To satisfy the information needs, Exxellence group will have to combine data from several back-office source systems along with other information from other sources such as customer satisfaction surveys. These sources will have to be identified per municipality due to the large variance in the types of back-office systems that are used in different municipalities. A data warehouse schema should be created that matches the information needs. The sources of information used to fill the data warehouse can then be identified per municipality.
In addition municipalities will have to access their processes and the training level of their personnel to see whether they are able to correctly capture all the information required to satisfy the information needs.
