230 likes | 464 Views
Query Optimization. Allison Griffin. Importance of Optimization. Time is money Queries are faster Helps everyone who uses the server Solution to speed lies in the algorithm Different performance improvements with different database engines and schemas. Brief History.
E N D
Query Optimization Allison Griffin
Importance of Optimization • Time is money • Queries are faster • Helps everyone who uses the server • Solution to speed lies in the algorithm • Different performance improvements with different database engines and schemas
Brief History • Before 1970’s: Dark days, manual optimization • Late 70’s to mid 80’s: • Birth of relational data model and declarative SQL • Optimization is job of system • System R-beginning work on join order optimization • Dynamic Programming: Heuristic Optimizers • Mid 80’s to early 90’s: • Extensible query optimization (Exodus) • Mid 90’s to late 90’s: • Materialized Views
Volcano Extensible Query Optimizer Generator • General purpose cost based query optimizer, based on equivalence rules in algebra • Equivalences: join associativity, select push down, aggregate push down • Extensible: new operations and equivalences can be easily added • Developed by Graefe and McKenna 1993
Materialized Views • Can materialize (pre-compute and store) views to speed up queries • Incremental maintenance • when database is updated, propagate updates to materialized view without complete re-computation • Deciding when to use materialized views • even if query does not refer to materialized view, optimizer can figure out it can be used
Deciding What to Materialize • Maintenance cost and query cost • Workload depends on what is materialized: • queries and update transactions • weights for each component of workload • Goal: find set of views that gives minimum cost if materialized, subject to space constraints
What we already know… • Query optimizer analyzes set of query execution plans and gives optimal (least cost) • Heavily dependent on optimizer’s estimate for number of rows that will result at each step of QEP • Estimates rely on statistics typically stored in histograms
Recent Approaches to Improve Statistics • Paper “Distinct-Value Synopses for Multiset Operations” by Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwalk, and Yannis Sismanis, 2007 • IBM’s LEO (Learning Empirical Results in Query Optimization), 2001
Summary of Paper Results • Addresses the problem of efficient estimate of number of distinct values of an attribute • Builds on leveraging of randomized algorithms • Claim to have unbiased estimator for distinct values with lower mean squared error • Past attempts tend to by higher than the actual number so they have come up with way to cut that number down to be more reasonable
Distinct-Value Estimation • Propose summary structure (synopsis) for a relation • Synopsis can be used to estimate number of DVs in the partition • Synopses can be combined to create synopses for compound partitions created from base partitions using multiset union, intersection or difference operations • Updates can be performed on compound partitions by using synopses from base relations
LEO - Learning Emperical Results in Query Optimization • Autonomic feedback loops that create a self-tuning database query optimizer • Self-validates and adjusts to improve query optimization and execution without requiring user interaction to repair incorrect statistics or cardinality estimates • Reduces the total cost of owning database management systems by simplifying database administration
How it works • Monitors queries as they execute • Compares the optimizer’s estimates with actuals at each step in a QEP • Then computes adjustments to its estimates that may be used during future optimizations of similar queries • Moreover, estimation errors can also trigger re-optimization of a query in mid-execution.
Challenges in Research of LEO • (1) ensuring stability and convergence of the autonomic system • (2) guaranteeing consistency of the overall optimizer's model upon refinements
Results • Reduction of query execution time by orders of magnitude at negligible additional run-time cost • Reduced administration time • Fewer problem queries • Overall improved query performance with increased robustness and predictability of query response times
Bibliography • “LEO-Learning Empirical Results in Query Optimization.” IBM. <http://domino.watson.ibm.com/comm/research.nsf/pages/r.datamgmt.innovation.html>. • “Optimizing for Query Speed”. SQL. <http://www.devshed.com/c/a/MySQL/Optimizing-for-Query-Speed/1/ • “Optimizing Database Queries”. IBM. <http://www.stevengould.org/portfolio/developerWorks/efficientPHP/wa-effphp/wa-effphp-4-1.html>. • “Optimize Queries Theory in Practice”. <http://www.serverwatch.com/tutorials/article.php/2175621/How-to-Optimize-Queries-Theory-an-Practice.htm>. • Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani, Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of the ACM. Vol. 52. October 2009. • Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data Management Meets Statistical Estimation”. Communications of the ACM. Vol. 52. October 2009.