1 / 38

Proactive Re-optimization

Proactive Re-optimization. Research by Shivnath Babu, Pedro Bizarro, David DeWitt. Presented by Pedro Bizarro. Roadmap. Introduction Re-Optimization (and its Limitations) Proactive Re-Optimization Bounding Boxes Robust and Switchable Plans Random Samples Experimental Results

noe
Download Presentation

Proactive Re-optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proactive Re-optimization Research by Shivnath Babu, Pedro Bizarro, David DeWitt Presented by Pedro Bizarro

  2. Roadmap • Introduction • Re-Optimization (and its Limitations) • Proactive Re-Optimization • Bounding Boxes • Robust and Switchable Plans • Random Samples • Experimental Results • Conclusions

  3. What is the Problem? Sometimes database optimizers choose query plans that are sub-optimal by orders of magnitude

  4. How the Problem Arises (1) • Optimizer relies on statistics to cost plans However: • some statistics are not up-to-date and • most are not even in the catalog • Missing statistics are estimated using • other (possibly estimated) statistics • assumptions (independency, uniformity, etc) • default values Errors in Estimates

  5. How the Problem Arises (2) • Well-know fact: errors in estimates for statistics in intermediate expressions grow exponentially • Also, cost functions are not smooth • Small errors in estimate may be big errors in cost • E.g.: Cost Memory Small errors become big errors

  6. Roadmap • Introduction • Re-Optimization (and its Limitations) • Proactive Re-Optimization • Bounding Boxes • Robust and Switchable Plans • Random Samples • Experimental Results • Conclusions

  7. Re-optimization Overview[KD98, M+04] • Compute plan as normal • Add check operators to plan to: • check when plan becomes sub-optimal • check for significant discrepancies between estimated and observed values • Execute and react approach • Trigger re-optimization when checks fail • Else, proceed

  8. Traditional optimizer chooses plan P Re-optimizer chooses same plan P and adds checks INLJ INLJ CHECK T T INLJ INLJ CHECK R S R S Choosing Plans:Optimization vs. Re-Optimization

  9. Re-optimization’sMain Limitation • Optimizer picks plan unaware of possible future re-optimizations • I.e., optimization assumes no re-optimization • What can go wrong? Can we do better?

  10. Example: Query: σ(R) S Plan P1: σ(R) HHJ S Plan P2: S HHJσ(R) P1 Cost P2 Size of σ(R) Estimated size of σ(R) Re-optimization Limitations • Re-optimization is expensive(could avoid it by using robust plans) • May lose partial work • If start on P1 and re-optimize to P2, will repeat scan on R P1 is risky! P2 is robust.

  11. Re-optimize Re-optimize Re-optimize INLJ HHJ HHJ HHJ T T T σ(R) INLJ HHJ HHJ HHJ σ(R) σ(R) S S S S σ(R) T Re-optimization Limitations (2) • Limited information collected at run-time • Only detects when to re-optimize • E.g., detects that some selectivity > 5% • But, future re-optimization needs more information • E.g., what is the true value of selectivity? • May “thrash” without that knowledge

  12. Roadmap • Introduction • Re-Optimization (and its Limitations) • Proactive Re-Optimization • Bounding Boxes • Robust and Switchable Plans • Random Samples • Experimental Results • Conclusions

  13. Proactive Re-Optimization in a Nutshell If DBMS knows re-optimization can happen: It can (proactively) pick plans that are better than picking a plan and then (reactively) re-optimizing if something goes wrong. It should try to avoid it It can (proactively) collect statistics needed for future re-optimization steps for the same query. It should plan for it

  14. Building Blocks ofProactive Re-optimization • Use of bounding boxes (intervals around estimates) • Use of robust plans and switchable plans • Enhanced run-time statistics collection

  15. 1. Compute bounding boxes for estimates No, reoptimize Estimate within the bounding box? Yes, use robust or switchable plan 3. Execute query; Collectaccurate statistics estimates Run-time estimates Proactive Re-optimization Architecture Optimization QUERY 2. Use bounding boxes to pick robust or switchable plans CATALOG Execution

  16. Roadmap • Introduction • Re-Optimization (and its Limitations) • Proactive Re-Optimization • Bounding Boxes • Robust and Switchable Plans • Random Samples • Experimental Results • Conclusions

  17. Bounding Boxes:Representing Uncertainty • Interval around estimate is: • wide if optimizer uncertain about estimate • narrow if optimizer certain about estimate • How is uncertainty measured? • From the way the statistic is estimated, e.g.: • Histogram -> very certain • Multiplication of selectivities -> uncertain • Default guess -> very uncertain • Etc.

  18. Query: σ(R) S Bounding Boxes:Representing Uncertainty • Interval around estimate is: • wide if optimizer uncertain about estimate • narrow if optimizer certain about estimate |S| Bounding box high est. low |σ(R)| low estimated high

  19. Bounding Boxes:Plan Costing and Pruning • Costing - Computes three costs per each plan tree: (2-dim bounding box using cardinality estimates from sub-plans) • Pruning • For each join subset and interesting order: • Finds 3 plans: BestLow, BestEst, and BestHigh • The plans with lowest CLow, CEst, and CHigh, respectively • All others are pruned Cost? |S| CHigh, cost here CEst, cost here CLow, cost here HHJ high est. σ(R) low S |σ(R)| low estimated high

  20. Roadmap • Introduction • Re-Optimization (and its Limitations) • Proactive Re-Optimization • Bounding Boxes • Robust and Switchable Plans • Random Samples • Experimental Results • Conclusions

  21. At the end of plan enumeration there are three seed plans: Four cases: The seeds are the same plan One of the seeds is robust A switchable plan can be created from them No single plan, not robust, not switchable P1 Query: σ(R) S Plan P1: σ(R) HHJ S Plan P2: S HHJσ(R) Cost P2 Size of σ(R) Bounding box for σ(R) estimate low estimated high cost close to optimal in all 3 points of bounding box Selecting Plans BestEst=P1 BestHigh=P2 BestLow=P1

  22. Not really switchable, but late binding decision Switchable Plans • Switchable Plan Goals: • Pick plan only after uncertainty resolved • Do no lose or repeat work • Plans are switchable if: • Have different root operator • Have the same sub-plan for deep input of root • Have the same base table as other input E.g.: INLJ Hash2 Hash3 Index Seek on T Scan T Scan T Hash1 Hash1 Hash1 Scan R Scan S Scan R Scan S Scan R Scan S

  23. switch operator INLJ, Index seek on T ? Hash2, Scan T buffer operator Hash3, Scan T Implementation ofa Switchable Plan • Buffer tuples until a tuple random sample is obtained • Compute estimate and pass it up to switch operator • Switch operator instantiates correct operator • No work lost or repeated Switchable Plan T Hash1 Scan R Scan S

  24. Roadmap • Introduction • Re-Optimization (and its Limitations) • Proactive Re-Optimization • Bounding Boxes • Robust and Switchable Plans • Random Samples • Experimental Results • Conclusions

  25. Observing Statistics at Run-Time • Goal 1: Detect when to re-optimize • Detect when outside bounding box • Goal 2: Must be cheap • To minimize impact on execution • Goal 3: Must be quick • To avoid spending time in wrong plan • Goal 4: Must be accurate • To avoid re-optimization thrashing

  26. b c d f g i j Emit eos(30%) punctuation to parent operator The Idea:Random Sample Prefix • Prefix output of operators with random sample of entire output Normal output without random prefix a a b c d e e f g h h i j Output with random prefix • When eos punctuation arrives system can: • Compute estimate based on random sample • Determine if bounding box violation happened • Use revised estimate in future re-optimization

  27. ImplementingRandom Sample Prefixes • Assume random samples of base relations • For each relation R, there is an R_sample • R_sample is a random sample of R • Modified scan operator: • scan R_sample • emit eos • scan R skipping tuples in R_sample • Similarly for indexed scan

  28. NLJ Output R.S tuples and EOS … … Probe side Build side R sample S sample Memory: rest of R rest of S Disk: ImplementingRandom Sample Prefixes - Joins • Modified BNLJ and INLJ operators • Pass eos from outer relation • Ignore eos from inner relation • True random sample of join if outer is FK side • Hash join: eos eos Then, do second pass as in normal hash Read S into memory until EOS of S Read rest of S, partition, probe R, output tuples Read all of R, partition, probe S sample R’ S sample S’ R sample S sample rest of R rest of S R’’ S’’

  29. Traditional Re-Optimizer Proactive Re-Opt Switch HHJ HHJ σ(O) σ(O) σ(O) Switch HHJ HHJ σ(A) C σ(A) C σ(A) C Re-optimizes Re-optimizes Sub-optimal HHJ HHJ Optimal σ(O) HHJ HHJ σ(A) Sub-optimal σ(A) C C σ(O) Wrap-up Example: σ(A)*C*σ(O) Assume error in Estimate σ(A)!

  30. Roadmap • Introduction • Re-Optimization (and its Limitations) • Proactive Re-Optimization • Bounding Boxes • Robust and Switchable Plans • Random Samples • Experimental Results • Conclusions

  31. Experimental Evaluation Setup • Optimizers used: • TRAD – Traditional optimizer • VRO – Reactive re-optimizer • Rio – Our proactive re-optimizer • Built within Predator DBMS • DMV Dataset: Adapted from real dataset • Contains correlations between attributes Our implementation of [MR+04]

  32. Using Robust Plans: Join σ(A)*CError in |σ(A)| estimate, 150MB per hash join

  33. Using Switchable Plans: Join σ(A)*CError in |σ(A)| estimate, 50MB per hash join

  34. 3-way join: σ(A)*C*σ(O)Error in |σ(A)| estimate, 50MB per hash join

  35. Query ComplexityErrors due to correlated attributesVRO-R is VRO optimizer plus random sampling

  36. Conclusions • “Look around” with bounding boxes • Use robust plans • Use late binding switchable plans • Either avoid re-optimization • Or re-optimize quickly and with accurate estimates • Use tuple random samples • Overhead small: optimization time <1.7%

  37. Acknowledgements… Jennifer Widom for discussion and feedback Guy Lohman and Volker Markl for providing DMV data and workload generator … and References [KD98] Kabra and DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. SIGMOD’98 [M+04] Markl, et al. Robust Query Processing through Progressive Optimization. SIGMOD'04.

  38. Thank you! Questions? Feedback? Rio is a Pro-Active Re-Optimizer developed by Shivnath Babu, Pedro Bizarro, and David DeWitt

More Related