On Friday 30 January 2009, Luuk Peters defended his MSc thesis “Battle of the Bulk: Corporate XMLDB vs. Research XMLDB”. The MSc project was carried out at Finan. It was supervised by me, Riham Abdel Kader (UT), Michiel Schipper (Finan) and Joost Willemse (Finan).
“Battle of the Bulk: Corporate XMLDB vs. Research XMLDB”
Finan, a company offering solutions for financial analysis, uses Oracle’s XML-support to store, query and analyze financial reports obtained in XBRL, an open XML-based standard for defining and exchanging business and financial performance information. The goal of the project was to improve the performance of querying a high volume of financial XML documents with changing schemas. A secondary goal was to compare Oracle’s performance on this task with that of MonetDB/XQuery as a representative of a successful XML DBMS from academia.
Luuk thoroughly investigated many strategies for improvement: Oracle’s storage strategies (CLOB, Binary XML, XMLType, XMLTable) some schema-less some schema-based, other non-standard XML-document schemas, and variations in query formulation. He experimented with many possible combinations of these alternatives under varying conditions (database size, query complexity).
Since performance comparisons with Oracle is sensitive information, I cannot say anything specific about the outcomes. What I can say, is that Oracle offers a wide variety XML techniques, each with its own strengths and weaknesses. Much was learned on how these techniques work and how they affect execution performance. It proved hard to formulate a simple and concrete advise regarding the best strategy, because of the influence of so many factors. Nevertheless, considerable improvement could be obtained by choosing the right strategies. Furthermore, MonetDB/XQuery also proved up to the task of financial analysis. We believe that both industry and academia can learn from each other’s techniques. My hope is that this MSc project brought us a step closer to efficient and scalable general-purpose XML DBMS technology.
A public version of Luuk’s MSc thesis is not available yet, but I will of course immediately write about it here as soon as it does.