Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998

The Problem • Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality • These estimates may be highly inaccurate, especially for user-defined types or predicates • The errors become multiplicative as the number of joins increases • We might have chosen a more nearly optimal plan based on greater knowledge

The Solution • We shall monitor how the query is doing at key points, and consider dynamically re-optimizing those portions of the query which have not yet been started • Since re-optimization is expensive, we shall only do it if we think we will see an improvement

Elements of the Algorithm • Annotated Query Execution Plans • Annotate plan with estimates of size • Runtime Collection of Statistics • Statistics collectors embedded in execution tree • Keep overhead down • Dynamic Resource Re-allocation • Reallocate memory to individual operations • Query Plan Modification • May wish to re-optimize the remainder of query

Annotated Query Plans • We save at each point in the tree the expected: • Sizes and cardinalities • Selectivities of predicates • Estimates of number of groups to be aggregated

Statistics Collectors • Add into tree • Must be collectable in a single pass • Will only help with portions of query “beyond” the current pipeline

Resource Re-Allocation • Based on improved estimates, we can modify the memory allocated to each operation • Results: less I/O, better performance • Only for operations that have not yet begun executing

Plan Modification • Create new plan for remainder, treating temp as an input • Only re-optimize part not begun • Suspend query, save intermediate in temp file

Re-Optimization • When to re-optimize: • Calculate time current should take (using gathered stats) • Only consider re-optimization if: • Our original estimate was off by at least some factor 2 and if • Topt, estimated < 1Tcur-plan,improved where 1  5% and cost of optimization depends on number of operators, esp. joins • Only modify the plan if the new estimate, including the cost of writing the temp file, is better

Low-Overhead Statistics • Want to find “most effective” statistics • Don’t want to gather statistics for “simple” queries • Want to limit effect of algorithm to maximum overhead ratio,  • Factors: • Probability of inaccuracy • Fraction of query affected

Inaccuracy Potentials • The following heuristics are used: • Inaccuracy potential = low, medium, high • Lower if we have more information on table value distribution • 1+max of inputs for multiple-input selection • Always high for user-defined methods • Always high for non-equijoins • For most other operators, same as worst of inputs

More Heuristics • Check fraction of query affected • Check how many other operators use the same statistic • The winner: • Higher inaccuracy potentials first • Then, if a tie, the one affecting the larger portion of the plan

Implementation • On top of Paradise (parallel database that supports ADTs, built on OO framework) • Using System-R optimizer • New SCIA (Stat Collector Insertion Algorithm) and Dynamic Re-Optimization modules

It Works! • Results are 5% worse for simple queries, much better for complex queries • Of course, we would not really collect statistics on simple queries • Data skew made a slight difference - both normal and re-optimized queries performed slightly better

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Presentation Transcript

Understanding SQL Server Query Execution Plans

QUERY EXECUTION

Query Optimization

Query Execution

Query Execution

Query Execution

QUERY EXECUTION

Query Optimization

Proactive Query Re-optimization

Under the hood: Query Optimization, Query Execution plans

Query Execution

Query Execution

QUERY EXECUTION

Query Optimization

Query Optimization

Understanding SQL Server Query Execution Plans

Query Execution

Lecture 7: Query Execution and Optimization

Query Execution

Query Execution