ROX: Run-time Optimization of XQueries
Riham Abdel Kader (UT), Peter Boncz (CWI), Stefan Manegold (CWI), Maurice van Keulen (UT)
Optimization of complex XQuery queries that combine many XPath steps as well as join conditions is currently hindered by the absence of good result size estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated join, selection and aggregation predicates.
In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute and materialize partial results on the fly, observing intermediate result characteristics as well as applying sampling techniques to evaluate the real observed query cost. The query optimization problem studied here takes as input a Join Graph where the edges are either equi-predicates or XPath axis steps, and the execution environment provides value- and structural-join algorithms, in addition to structural and value-based indices.
While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. The ROX approach deals with these issues by limiting the run-time search space to so-called “zero-investment” algorithms for which sampling can be guaranteed to be strictly linear in sample size. While the Join Graph used in ROX is a purely relational concept, it crucially fits our XQuery domain as all structural join algorithms and XML value indices we use have the zero-investment property.
We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.
The paper will be presented at the ACM International Conference on Management of Data (SIGMOD 2009), 29 June – 2 July 2009, Providence, Rhode Island, USA. [details]