Archive for the Category » XML databases «

Tuesday, April 07th, 2009 | Author:

On Friday 30 January 2009, Luuk Peters defended his MSc thesis “Battle of the Bulk: Corporate XMLDB vs. Research XMLDB”. The MSc project was carried out at Finan. It was supervised by me, Riham Abdel Kader (UT), Michiel Schipper (Finan) and Joost Willemse (Finan).

“Battle of the Bulk: Corporate XMLDB vs. Research XMLDB”
Finan, a company offering solutions for financial analysis, uses Oracle’s XML-support to store, query and analyze financial reports obtained in XBRL, an open XML-based standard for defining and exchanging business and financial performance information. The goal of the project was to improve the performance of querying a high volume of financial XML documents with changing schemas. A secondary goal was to compare Oracle’s performance on this task with that of MonetDB/XQuery as a representative of a successful XML DBMS from academia.

Luuk thoroughly investigated many strategies for improvement: Oracle’s storage strategies (CLOB, Binary XML, XMLType, XMLTable) some schema-less some schema-based, other non-standard XML-document schemas, and variations in query formulation. He experimented with many possible combinations of these alternatives under varying conditions (database size, query complexity).

Since performance comparisons with Oracle is sensitive information, I cannot say anything specific about the outcomes. What I can say, is that Oracle offers a wide variety XML techniques, each with its own strengths and weaknesses. Much was learned on how these techniques work and how they affect execution performance. It proved hard to formulate a simple and concrete advise regarding the best strategy, because of the influence of so many factors. Nevertheless, considerable improvement could be obtained by choosing the right strategies. Furthermore, MonetDB/XQuery also proved up to the task of financial analysis. We believe that both industry and academia can learn from each other’s techniques. My hope is that this MSc project brought us a step closer to efficient and scalable general-purpose XML DBMS technology.

A public version of Luuk’s MSc thesis is not available yet, but I will of course immediately write about it here as soon as it does.

Category: Student projects, XML databases  | Tags: , , , ,  | Comments off
Wednesday, March 11th, 2009 | Author:

ROX: Run-time Optimization of XQueries
Riham Abdel Kader (UT), Peter Boncz (CWI), Stefan Manegold (CWI), Maurice van Keulen (UT)
Optimization of complex XQuery queries that combine many XPath steps as well as join conditions is currently hindered by the absence of good result size estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated join, selection and aggregation predicates.

In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute and materialize partial results on the fly, observing intermediate result characteristics as well as applying sampling techniques to evaluate the real observed query cost. The query optimization problem studied here takes as input a Join Graph where the edges are either equi-predicates or XPath axis steps, and the execution environment provides value- and structural-join algorithms, in addition to structural and value-based indices.

While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. The ROX approach deals with these issues by limiting the run-time search space to so-called “zero-investment” algorithms for which sampling can be guaranteed to be strictly linear in sample size. While the Join Graph used in ROX is a purely relational concept, it crucially fits our XQuery domain as all structural join algorithms and XML value indices we use have the zero-investment property.

We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.

The paper will be presented at the ACM International Conference on Management of Data (SIGMOD 2009), 29 June – 2 July 2009, Providence, Rhode Island, USA. [details]

Category: Opaque, XML databases  | Tags: , , , , , ,  | Comments off
Wednesday, February 25th, 2009 | Author:

Clarisse Kagoyire today defended her MSc thesis at the International Institute for Geo-Information Science and Earth Observation (ITC) in Enschede. She explored the application of XML database technology for distributed spatial data processing using web services. The idea is that XML database technology, if equipped with spatial support, can avoid development and run-time overhead by working directly on exchanged GML data compared to typical relational SDI*-based solutions as it allows you to stay in the XML-domain. Clarisse implemented a soil erosion scenario involving 5 independent XML database servers using MonetDB/XQuery as XML database platform. MonetDB/XQuery’s support for XRPC proved important and powerful for realizing the distributed query processing. Furthermore, Clarisse used and helped specify the recently implemented XQuery spatial functionality. The research shows that XML database technology is suitable for implementing web services and that the preliminary unoptimized spatial support in MonetDB/XQuery is already sufficient for certain distributed spatial data processing tasks.
(*) SDI = Spatial Data Infrastructure

Tuesday, February 17th, 2009 | Author:

Last week (11 – 13 February) we held a Pathfinder project meeting. 20 people attended from 6 institutes (Uni.Tübingen, CWI, Uni.Twente, NFI Den Haag, Uni.Konstanz, ETH Zürich). We discussed about scalability, multiple front- and backends, full-text support, porting to MonetDB5, etc. Regarding the latter, we decided to indeed invest in porting Pathfinder to MonetDB5 starting in April. There were also a number of presentations. I presented the first attempts in adding spatial support to Pathfinder and Riham presented our ROX run-time join optimization approach for XQuery.

Friday, December 05th, 2008 | Author:

The project description for Joost Diepenmaat’s Msc project has been finalized. The project is being supervised by me and Jan Flokstra.

Object XML Mapping
Database Management Systems (DBMSs) play an important role in application development. Almost every information system stores its information in a DBMS. Nowadays, the application domain requires, more and more, specific and detailed storage structures for complex objects. This application domain ranges from document standards we use every day (e.g. OpenXML for word processing) to specific business process documents (e.g. the Universal Business Language for purchase, orders, invoices, etc.). Due to these complex, but useful standards, there is a stronger need for agile adoption techniques within the development frameworks.
It would be easier to access XML data using mechanisms that fit in the application development language better. Object Relational Mappings/Domain Models currently support this for the relational SQL world. As they manage interconnected objects, where each object represents something meaningful within the application domain. A Domain Model in an application contains a whole layer of objects that model the business you’re working in. These objects mimic the data in the business and/or capture the business rules.

The thesis studies the possibilities of an Object XML Mapping with XML/XPath as a base language for persistent storage. We explore and describe the fundamentals of Object XML Mappings. Various database back-ends (with or without build in XML support) will be studied. A ruby-based prototype implementation with support for build in XML database (XPath Accelerator) or a native databases will show the capabilities of the Object XML Mapping.

Category: Student projects, XML databases  | Tags: , , ,  | Comments off
Thursday, December 04th, 2008 | Author:

MSc student Luuk Peters received “green light” for his concept thesis “Battle of the bulk – Corporate XMLDB vs. Research XMLDB”. The defence is expected to take place in January 2009. This MSc research was conducted at FINAN, a company that specializes on software for financial analysis for risk management. Luuk performed extensive experiments with Oracle trying to find out how to apply its XML-support for scalable querying of XBRL documents (XBRL is an XML standard for financial reporting). Moreover, Luuk compared performance and scalability of Oracle with MonetDB/XQuery. MonetDB/XQuery clearly beats Oracle on raw performance, but when the collection of documents becomes larger than 80,000 documents (1.7GB), MonetDB/XQuery starts to give stability problems while Oracle continues to (albeit slowly) produce answers.

Category: Student projects, XML databases  | Tags: , ,  | Comments off
Wednesday, December 03rd, 2008 | Author:

Today I gave a demonstration of the “EPrints Clickable Views” prototype during the CWI meeting (Committee on Scientific Information). EPrints is the faculty’s publication management system. The prototype allows end-users to define views on the publication database and see the results of their changes in real-time and WYSIWYG. The prototype is based on the MonetDB/XQuery XML database and demonstrates its power and scalability for applications like these. The committee decided that it is definitely worthwhile to pursue this effort further. A project will be started to develop a production version in 2009.

Category: XML databases  | Tags: , ,  | Comments off
Monday, June 26th, 2006 | Author:

MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the art with a number of new technical contributions, such as looplifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11 GB. The performance section also provides an extensive comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met.

The paper was presented at the ACM International Conference on Management of Data (SIGMOD 2006), 26-29 June 2006, Chicago, IL, USA. [electronic version] [details]