180 likes | 302 Views
Florian Waas, EMC Corp. Leo Giakoumakis , Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/ 2011. Plan Space Analysis Detecting Plan Regressions in Cost-based Query Optimizers. Tale of a Plan Regression. Standard regression test just don’t cut it!.
E N D
Florian Waas, EMC Corp. Leo Giakoumakis, Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/2011 Plan Space AnalysisDetecting Plan Regressions in Cost-based Query Optimizers Florian Waas, EMC/Greenplum
Tale of a Plan Regression Standard regression test just don’t cut it! Florian Waas, EMC/Greenplum • Applied ‘obvious’ improvement to optimizer • Passed all regression tests without problem • Shipped proudly! • Lots of customers complain about plan regressions • Hard conversation between Dev and QA • Dev: Why didn’t you find this? This change affects virtually all queries! • QA: Why can’t you tell me what to look for if it’s so ‘obvious’?
Takeaways Florian Waas, EMC/Greenplum • Conventional testing of optimizer focuses on a singlebest-plan found per query • Ignores massive space of rejected alternatives • Plan Space Analysis • Takes many/all plans considered into account • Quantifies optimizer changes – even if result not affected • Detects regressions early in the development process
Nomenclature Florian Waas, EMC/Greenplum • Optimizer selects best plan found (BPF) • Rejects non-trivial numbers of alternatives • Explicitly or implictly • Plan regression • Code-level change to optimizer leads to bad plan choice • Perceived or actual
Dilemma of Optimizer Testing Quantify – but do not judge Florian Waas, EMC/Greenplum • Optimizers work off theoretical models • All practical models have limitations • Most non-trivial queries exceed limitations of model • May lead to contradicting optimization problems • Get query Q1 right OR query Q2… • Right or wrong is a matter of view point and business priority
Standard Test Procedure Inconclusive, labor intensive, does not scale Florian Waas, EMC/Greenplum • Choose relevant workload • Freeze BPF • Apply modification • Test against frozen BPF • Diff may indicate regression • Manual intervention needed to determine actual impact • In practice: lots of false positives/negatives
Desiderata for better regression tests Calls for abstract evaluation framework Florian Waas, EMC/Greenplum • Simplicity, transparency • Simple number • Meaningful correlation to system • Technology agnostic, targeted • Does not reverse engineer optimizer • ‘understands’ executor • Surgical, specific • Actionable • Applicable to any and every workload • Practical • Easy to compute, robust methodology
Plan Spaces Florian Waas, EMC/Greenplum • Set of alternatives considered by optimizer • Product specific • Non-trivial size • E.g., TPC-H 5: 230+ million alternatives • Contains optimal plan(s) • According to database parameters • Think: statistics • Pairwise relationships based on cost function • E.g., cost(Popt) < cost(P)
Observation Optimizer must assess plans conclusively Florian Waas, EMC/Greenplum Given a query For each plan alternative P There exists a configuration so that P is optimal Even if distinctly suboptimal in original query/configuration
Ideal optimizer Establishes correct order of plan alternatives Florian Waas, EMC/Greenplum • Makes no mistakes • Establishes partial order betweenalternatives according to estimates • Estimated order matches actual execution • Regardless of actual cost values
Plan Space Analysis: Principle Use Correlation as measure for quality/impact Florian Waas, EMC/Greenplum • Enumerate plan alternatives • Have optimizer cost them • Determine order O1 according to estimated cost • Execute all plans alternatives • Determine order O2 according to actual execution cost • Compute correlation of O1 and O2
Plan Space Analysis: Correlation Florian Waas, EMC/Greenplum • Spearman-Coefficient • Value range [-1,1] • 1 perfect monotone function • 0 uncorrelated • etc.
Plan Space Analysis: in Practice Confine PSA to relevant areas of space Florian Waas, EMC/Greenplum • Use sample of space • Uniform sampling • Galindo-Legariaet al. VLDB 1994 • Waas, Galindo-Legaria, SIGMOD 2000 • Simple hints/forcing will do too • Ignore certain plans • cost(P) > cost(Popt) * k • | act(P1) – act(P2) | < d
Experiments Florian Waas, EMC/Greenplum • Commercial query optimizer • Built-in ranking module for sampling • Sample of 20 plans/query • Fixed seed for repeatability • 3 iterations for execution
TPC-H • 1GB scale factor • (Very) good results overall • Known issues Results match expectations Florian Waas, EMC/Greenplum
Sensitivity to Regressions PSA is effective early warning system Florian Waas, EMC/Greenplum • Modified cost model parameter • Costing of hash in HJ • BPF only affected by last modification • Detects any detrimental change immediately • Applies to all types of regressions
Takeaways Florian Waas, EMC/Greenplum • Conventional testing of optimizer focuses on a singlebest-plan found per query • Ignores massive space of rejected alternatives • Plan Space Analysis • Takes many/all plans considered into account • Quantifies optimizer changes – even if result not affected • Detects regressions early in the development process
Florian Waas, EMC Corp. florian.waas@emc.com leogia@microsoft.com Shin.zhang@microsoft.com www.database-research.com Thank you! Florian Waas, EMC/Greenplum