1 / 14

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998. The Problem. Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality

adanne
Download Presentation

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998

  2. The Problem • Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality • These estimates may be highly inaccurate, especially for user-defined types or predicates • The errors become multiplicative as the number of joins increases • We might have chosen a more nearly optimal plan based on greater knowledge

  3. The Solution • We shall monitor how the query is doing at key points, and consider dynamically re-optimizing those portions of the query which have not yet been started • Since re-optimization is expensive, we shall only do it if we think we will see an improvement

  4. Elements of the Algorithm • Annotated Query Execution Plans • Annotate plan with estimates of size • Runtime Collection of Statistics • Statistics collectors embedded in execution tree • Keep overhead down • Dynamic Resource Re-allocation • Reallocate memory to individual operations • Query Plan Modification • May wish to re-optimize the remainder of query

  5. Annotated Query Plans • We save at each point in the tree the expected: • Sizes and cardinalities • Selectivities of predicates • Estimates of number of groups to be aggregated

  6. Statistics Collectors • Add into tree • Must be collectable in a single pass • Will only help with portions of query “beyond” the current pipeline

  7. Resource Re-Allocation • Based on improved estimates, we can modify the memory allocated to each operation • Results: less I/O, better performance • Only for operations that have not yet begun executing

  8. Plan Modification • Create new plan for remainder, treating temp as an input • Only re-optimize part not begun • Suspend query, save intermediate in temp file

  9. Re-Optimization • When to re-optimize: • Calculate time current should take (using gathered stats) • Only consider re-optimization if: • Our original estimate was off by at least some factor 2 and if • Topt, estimated < 1Tcur-plan,improved where 1  5% and cost of optimization depends on number of operators, esp. joins • Only modify the plan if the new estimate, including the cost of writing the temp file, is better

  10. Low-Overhead Statistics • Want to find “most effective” statistics • Don’t want to gather statistics for “simple” queries • Want to limit effect of algorithm to maximum overhead ratio,  • Factors: • Probability of inaccuracy • Fraction of query affected

  11. Inaccuracy Potentials • The following heuristics are used: • Inaccuracy potential = low, medium, high • Lower if we have more information on table value distribution • 1+max of inputs for multiple-input selection • Always high for user-defined methods • Always high for non-equijoins • For most other operators, same as worst of inputs

  12. More Heuristics • Check fraction of query affected • Check how many other operators use the same statistic • The winner: • Higher inaccuracy potentials first • Then, if a tie, the one affecting the larger portion of the plan

  13. Implementation • On top of Paradise (parallel database that supports ADTs, built on OO framework) • Using System-R optimizer • New SCIA (Stat Collector Insertion Algorithm) and Dynamic Re-Optimization modules

  14. It Works! • Results are 5% worse for simple queries, much better for complex queries • Of course, we would not really collect statistics on simple queries • Data skew made a slight difference - both normal and re-optimized queries performed slightly better

More Related