Archive for » January, 2012 «

Friday, January 13th, 2012 | Author:

On 18 January 2012, Sjoerd van der Spoel defended his MSc thesis “Outcome and variable prediction for discrete processes: A framework for finding answers to business questions using (process) data”. The MSc project was supervised by me and Chintan Amrit.
“Outcome and variable prediction for discrete processes: A framework for finding answers to business questions using (process) data”[download]
The research described in this paper is aimed at solving planning problems associated with a new hospital declaration methodology called DOT. With this methodology, that will become mandatory starting January 1st 2012, hospitals will no longer be able to tell in advance how much they will receive for the care they provide. A related problem is that hospitals do not know when delivered care becomes declarable. Topicus Fincare wants to find a solution to both these problems.
These problems, and more generally the problem of answering business questions that involve predicting process outcomes and variables is what this research aims to solve. The approach chosen is to model the business process as a graph, and to predict the path through that graph, as well as to use the path to predict the variables of interest. For the hospital, the nodes in the graph represent care activities, and the variables to predict are the care product – that determines the value of the provided care – and the duration of care.
A literature study has found data mining and shortest path algorithms in combination with a naive graph elicitation technique to be the best way of accomplishing these two goals. Specifically, Random Forests was found to be the most accurate technique for predicting path-variable relations and for predicting the final step of a process. The Floyd-Warshall shortest path algorithm was found to be the best technique for predicting the path between two nodes in the process graph.
To test this findings, a number of experiments was performed for the hospital case. These experiments show that Random Forests and the Floyd-Warshall algorithm are indeed the most accurate techniques in the test. Using Random Forests, the care product for a set of performed activities can be predicted with on average 50% accuracy, lows of 30% and highs of 70%. Using Floyd-Warshall, the consequent set of steps can be predicted with 45% accuracy on average, with lows of 25% and highs of 100%.
From the experiment with the hospital data, a set of processing steps for producing an answer to a business question was produced. The steps are trans- forming the business question, analyzing and transforming data, and then depending on the business question classifier training and variable prediction or process elicitation and path prediction. The final step is to analyze the result, to see if it has adequately answered the question. That these processing steps do actually work was validated using a dataset from Topicus’ bug tracking soft- ware. In conclusion, the approach presented predicts the total cash flow to be expected from the provided care with average error between six and 17 percent. The time the provided care becomes declarable cannot be accurately predicted.