1 / 16

Query Optimization

Query Optimization. Allison Griffin. Importance of Optimization. Time is money Queries are faster Helps everyone who uses the server Solution to speed lies in the algorithm Different performance improvements with different database engines and schemas. Brief History.

hheath
Download Presentation

Query Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Optimization Allison Griffin

  2. Importance of Optimization • Time is money • Queries are faster • Helps everyone who uses the server • Solution to speed lies in the algorithm • Different performance improvements with different database engines and schemas

  3. Brief History • Before 1970’s: Dark days, manual optimization • Late 70’s to mid 80’s: • Birth of relational data model and declarative SQL • Optimization is job of system • System R-beginning work on join order optimization • Dynamic Programming: Heuristic Optimizers • Mid 80’s to early 90’s: • Extensible query optimization (Exodus) • Mid 90’s to late 90’s: • Materialized Views

  4. Volcano Extensible Query Optimizer Generator • General purpose cost based query optimizer, based on equivalence rules in algebra • Equivalences: join associativity, select push down, aggregate push down • Extensible: new operations and equivalences can be easily added • Developed by Graefe and McKenna 1993

  5. Materialized Views • Can materialize (pre-compute and store) views to speed up queries • Incremental maintenance • when database is updated, propagate updates to materialized view without complete re-computation • Deciding when to use materialized views • even if query does not refer to materialized view, optimizer can figure out it can be used

  6. Deciding What to Materialize • Maintenance cost and query cost • Workload depends on what is materialized: • queries and update transactions • weights for each component of workload • Goal: find set of views that gives minimum cost if materialized, subject to space constraints

  7. What we already know… • Query optimizer analyzes set of query execution plans and gives optimal (least cost) • Heavily dependent on optimizer’s estimate for number of rows that will result at each step of QEP • Estimates rely on statistics typically stored in histograms

  8. Recent Approaches to Improve Statistics • Paper “Distinct-Value Synopses for Multiset Operations” by Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwalk, and Yannis Sismanis, 2007 • IBM’s LEO (Learning Empirical Results in Query Optimization), 2001

  9. Summary of Paper Results • Addresses the problem of efficient estimate of number of distinct values of an attribute • Builds on leveraging of randomized algorithms • Claim to have unbiased estimator for distinct values with lower mean squared error • Past attempts tend to by higher than the actual number so they have come up with way to cut that number down to be more reasonable

  10. Distinct-Value Estimation • Propose summary structure (synopsis) for a relation • Synopsis can be used to estimate number of DVs in the partition • Synopses can be combined to create synopses for compound partitions created from base partitions using multiset union, intersection or difference operations • Updates can be performed on compound partitions by using synopses from base relations

  11. LEO - Learning Emperical Results in Query Optimization • Autonomic feedback loops that create a self-tuning database query optimizer • Self-validates and adjusts to improve query optimization and execution without requiring user interaction to repair incorrect statistics or cardinality estimates • Reduces the total cost of owning database management systems by simplifying database administration

  12. How it works • Monitors queries as they execute • Compares the optimizer’s estimates with actuals at each step in a QEP • Then computes adjustments to its estimates that may be used during future optimizations of similar queries • Moreover, estimation errors can also trigger re-optimization of a query in mid-execution.

  13. Challenges in Research of LEO • (1) ensuring stability and convergence of the autonomic system • (2) guaranteeing consistency of the overall optimizer's model upon refinements

  14. Results • Reduction of query execution time by orders of magnitude at negligible additional run-time cost • Reduced administration time • Fewer problem queries • Overall improved query performance with increased robustness and predictability of query response times

  15. Bibliography • “LEO-Learning Empirical Results in Query Optimization.” IBM. <http://domino.watson.ibm.com/comm/research.nsf/pages/r.datamgmt.innovation.html>. • “Optimizing for Query Speed”. SQL. <http://www.devshed.com/c/a/MySQL/Optimizing-for-Query-Speed/1/ • “Optimizing Database Queries”. IBM. <http://www.stevengould.org/portfolio/developerWorks/efficientPHP/wa-effphp/wa-effphp-4-1.html>. • “Optimize Queries Theory in Practice”. <http://www.serverwatch.com/tutorials/article.php/2175621/How-to-Optimize-Queries-Theory-an-Practice.htm>. • Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani, Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of the ACM. Vol. 52. October 2009. • Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data Management Meets Statistical Estimation”. Communications of the ACM. Vol. 52. October 2009.

More Related