Archive for the Category » Opaque «

Thursday, November 25th, 2010 | Author:

On November 25th, Riham Abdel Kader defended her thesis on her ROX-approach for run-time optimization of XQueries. Her work and thesis were well-received by the PhD committee. The ROX-approach brings more robustness to query optimizers in finding near-optimal execution plans and it can exploit intricate correlations in the data. Albeit meant for XML databases, the approach can be applied to ordinary relational databases as well RDF stores. Riham recently accepted a position at ASML. I am very proud of her and her work.
“ROX: Run-Time Optimization of XQueries”[download, OPAQUE project]
Query optimization is the most important and complex phase of answering a user query. While sufficient for some applications, the widely used type of relational optimizers are not always robust, picking execution plans that are far from optimal. This is due to several reasons. First, they depend on statistics and a cost model which are often inaccurate, and sometimes even absent. Second, they fail to detect correlations which can unexpectedly make certain plans considerably cheaper than others. Finally, they cannot efficiently handle the large search space of big queries.
The challenges faced by traditional relational optimizers and their impact on the quality of the chosen plans are aggravated in the context of XML and XQueries. This is due to the fact that in XML, it is harder to collect and maintain representative statistics since they have to capture more information about the document. Moreover, the search space of plans for an XQuery query is on average larger than that of relational queries, due to the higher number of joins resulting from the existence of many XPath steps in a typical XQuery.
To overcome the above challenges, we propose ROX, a Run-time Optimizer for XQueries. ROX is autonomous, i.e. it does not depend on any statistics and cost models, robust in always finding a good execution plan while detecting and benefiting from correlations, and efficient in exploring the search space of plans. We show, through experiments, that ROX is indeed robust and efficient, and performs better than relational compile-time optimizers. ROX adopts a fundamentally different internal design which moves the optimization to run-time, and interleaves it with query execution. The search space is efficiently explored by alternating optimization and execution phases, defining the plan incrementally. Every execution step executes a set of operators and materializes the results, allowing the next optimization phase to benefit from the knowledge extracted from the newly materialized intermediates. Sampling techniques are used to accurately estimate the cardinality and cost of operators. To detect correlations, we introduce the chain sampling technique, the first generic and robust method to deal with any type of correlated data. We also extend the ROX idea to pipelined architectures to allow most of the existing database systems to benefit from our research.

Friday, April 02nd, 2010 | Author:

We designed a variant of our ROX approach for run-time query optimization that works on database systems with a pipelined architecture (i.e., almost all commercial relational databases).
Run-time Optimization for Pipelined Systems
Riham Abdel Kader, Maurice van Keulen, Peter Boncz, Stefan Manegold
Traditional optimizers fail to pick good execution plans, when faced with increasingly complex queries and large data sets. This failure is even more acute in the context of XQuery, due to the structured nature of the XML language. To overcome the vulnerabilities of traditional optimizers, we have previously proposed ROX, a Run-time Optimizer for XQueries, which interleaves optimization and execution of full tables. ROX has proved to be robust, even in the presence of strong correlations, but it has one limitation: it uses full materialization of intermediate results making it unsuitable for pipelined systems. Therefore, this paper proposes ROX-sampled, a variant of ROX, which executes small data samples, thus generating smaller intermediates. We conduct extensive experiments which proved that ROX-sampled is comparable to ROX in performance, and that it is still robust against correlations. The main benefit of ROX-sampled is that it allows the large number of pipelined databases to import the ROX idea into their optimization paradigm.

The paper will be presented at the IV Alberto Mendelzon Workshop on Foundations of Data Management (AMW2010), 17-20 May 2010, Buenos Aires, Argentina [details]

Friday, September 18th, 2009 | Author:

We developed a demonstration that shows and explains what happens behind the scenes of our ROX approach for run-time query optimization.
The Robustness of a Run-Time XQuery Optimizer against Correlated Data
Riham Abdel Kader, Peter Boncz, Stefan Manegold, Maurice van Keulen
We demonstrate ROX, a run-time optimizer of XQueries, that focuses on finding the best execution order of XPath steps and relational joins in an XQuery. The problem of join ordering has been extensively researched, but the proposed techniques are still unsatisfying. These either rely on a cost model which might result in inaccurate estimations, or explore only a restrictive number of plans from the search space. ROX is developed to tackle these problems. ROX does not need any cost model, and defers query optimization to run-time intertwining optimization and execution steps. In every optimization step, sampling techniques are used to estimate the cardinality of unexecuted steps and joins to make a decision which sequence of operators to process next. Consequently, each execution step will provide updated and accurate knowledge about intermediate results, which will be used during the next optimization round. This demonstration will focus on: (i) illustrating the steps that ROX follows and the decisions it makes to choose a good join order, (ii) showing ROX’s robustness in the face of data with different degree of correlation, (iii) comparing the performance of the plan chosen by ROX to different plans picked from the search space, (iv) proving that the run-time overhead needed by ROX is restricted to a small fraction of the execution time.

The paper will be presented at the 26th International Conference on Data Engineering (ICDE2010), 1-6 Mar 2010, Long Beach, California, USA [details]

Wednesday, March 11th, 2009 | Author:

ROX: Run-time Optimization of XQueries
Riham Abdel Kader (UT), Peter Boncz (CWI), Stefan Manegold (CWI), Maurice van Keulen (UT)
Optimization of complex XQuery queries that combine many XPath steps as well as join conditions is currently hindered by the absence of good result size estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated join, selection and aggregation predicates.

In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute and materialize partial results on the fly, observing intermediate result characteristics as well as applying sampling techniques to evaluate the real observed query cost. The query optimization problem studied here takes as input a Join Graph where the edges are either equi-predicates or XPath axis steps, and the execution environment provides value- and structural-join algorithms, in addition to structural and value-based indices.

While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. The ROX approach deals with these issues by limiting the run-time search space to so-called “zero-investment” algorithms for which sampling can be guaranteed to be strictly linear in sample size. While the Join Graph used in ROX is a purely relational concept, it crucially fits our XQuery domain as all structural join algorithms and XML value indices we use have the zero-investment property.

We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.

The paper will be presented at the ACM International Conference on Management of Data (SIGMOD 2009), 29 June – 2 July 2009, Providence, Rhode Island, USA. [details]

Category: Opaque, XML databases  | Tags: , , , , , ,  | Comments off
Tuesday, February 17th, 2009 | Author:

Last week (11 – 13 February) we held a Pathfinder project meeting. 20 people attended from 6 institutes (Uni.Tübingen, CWI, Uni.Twente, NFI Den Haag, Uni.Konstanz, ETH Zürich). We discussed about scalability, multiple front- and backends, full-text support, porting to MonetDB5, etc. Regarding the latter, we decided to indeed invest in porting Pathfinder to MonetDB5 starting in April. There were also a number of presentations. I presented the first attempts in adding spatial support to Pathfinder and Riham presented our ROX run-time join optimization approach for XQuery.