Clarisse Kagoyire today defended her MSc thesis at the International Institute for Geo-Information Science and Earth Observation (ITC) in Enschede. She explored the application of XML database technology for distributed spatial data processing using web services. The idea is that XML database technology, if equipped with spatial support, can avoid development and run-time overhead by working directly on exchanged GML data compared to typical relational SDI*-based solutions as it allows you to stay in the XML-domain. Clarisse implemented a soil erosion scenario involving 5 independent XML database servers using MonetDB/XQuery as XML database platform. MonetDB/XQuery’s support for XRPC proved important and powerful for realizing the distributed query processing. Furthermore, Clarisse used and helped specify the recently implemented XQuery spatial functionality. The research shows that XML database technology is suitable for implementing web services and that the preliminary unoptimized spatial support in MonetDB/XQuery is already sufficient for certain distributed spatial data processing tasks.
(*) SDI = Spatial Data Infrastructure
Archive for the Category ◊ XML databases ◊
Last week (11 – 13 February) we held a Pathfinder project meeting. 20 people attended from 6 institutes (Uni.Tübingen, CWI, Uni.Twente, NFI Den Haag, Uni.Konstanz, ETH Zürich). We discussed about scalability, multiple front- and backends, full-text support, porting to MonetDB5, etc. Regarding the latter, we decided to indeed invest in porting Pathfinder to MonetDB5 starting in April. There were also a number of presentations. I presented the first attempts in adding spatial support to Pathfinder and Riham presented our ROX run-time join optimization approach for XQuery.
The project description for Joost Diepenmaat’s Msc project has been finalized. The project is being supervised by me and Jan Flokstra.
Object XML Mapping
Database Management Systems (DBMSs) play an important role in application development. Almost every information system stores its information in a DBMS. Nowadays, the application domain requires, more and more, specific and detailed storage structures for complex objects. This application domain ranges from document standards we use every day (e.g. OpenXML for word processing) to specific business process documents (e.g. the Universal Business Language for purchase, orders, invoices, etc.). Due to these complex, but useful standards, there is a stronger need for agile adoption techniques within the development frameworks.
It would be easier to access XML data using mechanisms that fit in the application development language better. Object Relational Mappings/Domain Models currently support this for the relational SQL world. As they manage interconnected objects, where each object represents something meaningful within the application domain. A Domain Model in an application contains a whole layer of objects that model the business you’re working in. These objects mimic the data in the business and/or capture the business rules.
The thesis studies the possibilities of an Object XML Mapping with XML/XPath as a base language for persistent storage. We explore and describe the fundamentals of Object XML Mappings. Various database back-ends (with or without build in XML support) will be studied. A ruby-based prototype implementation with support for build in XML database (XPath Accelerator) or a native databases will show the capabilities of the Object XML Mapping.
MSc student Luuk Peters received “green light” for his concept thesis “Battle of the bulk – Corporate XMLDB vs. Research XMLDB”. The defence is expected to take place in January 2009. This MSc research was conducted at FINAN, a company that specializes on software for financial analysis for risk management. Luuk performed extensive experiments with Oracle trying to find out how to apply its XML-support for scalable querying of XBRL documents (XBRL is an XML standard for financial reporting). Moreover, Luuk compared performance and scalability of Oracle with MonetDB/XQuery. MonetDB/XQuery clearly beats Oracle on raw performance, but when the collection of documents becomes larger than 80,000 documents (1.7GB), MonetDB/XQuery starts to give stability problems while Oracle continues to (albeit slowly) produce answers.
Today I gave a demonstration of the “EPrints Clickable Views” prototype during the CWI meeting (Committee on Scientific Information). EPrints is the faculty’s publication management system. The prototype allows end-users to define views on the publication database and see the results of their changes in real-time and WYSIWYG. The prototype is based on the MonetDB/XQuery XML database and demonstrates its power and scalability for applications like these. The committee decided that it is definitely worthwhile to pursue this effort further. A project will be started to develop a production version in 2009.
MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the art with a number of new technical contributions, such as looplifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11 GB. The performance section also provides an extensive comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met.
The paper was presented at the ACM International Conference on Management of Data (SIGMOD 2006), 26-29 June 2006, Chicago, IL, USA. [electronic version] [details]
