1 / 40

Adaptive Query Processing in the Looking Glass

Adaptive Query Processing in the Looking Glass. Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison). Adaptive Query Processing (AQP) Systems: Publication Timeline. STREAM. Tukwila. POP. Re-Opt. River. Eddies. Query Scrambling. CAPE. NiagaraCQ. Parametric opt.

minya
Download Presentation

Adaptive Query Processing in the Looking Glass

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison)

  2. Adaptive Query Processing (AQP) Systems:Publication Timeline STREAM Tukwila POP Re-Opt River Eddies Query Scrambling CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. Introduction

  3. Motivation • Plenty of recent work on Adaptive Query Processing (AQP) in different contexts • Conventional DBMS query processing, data integration, continuous queries in stream systems • No exhaustive, in-depth categorization and comparison of AQP systems to date • Difficult to answer questions like: • Will techniques from one system work on another? • What are the shortcomings of each system? • Which system is best for a new application domain? Introduction

  4. Our Contributions • Detailed study of current AQP systems • Classification of AQP systems into 3 families • Comparison across families in terms of AQP tasks • Identification of shortcomings & new approaches to address them Introduction

  5. Roadmap • Introduction to AQP • The three AQP system families • Comparison across families in terms of AQP tasks • Summary of what we learned

  6. Uses stats to cost plans Chosen plan Catalog (table sizes, histograms) Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats Primer on Traditional Query Processing Query Optimizer: Chooses best plan Introduction

  7. Need for Adaptive Query Processing Detect plan suboptimality, re-optimize Errors in stats estimates, optimizer mistakes Correlated & skewed data distributions Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computing Introduction

  8. Our Focus: AQP for a Single Query • AQP System: • A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single query Introduction

  9. Roadmap • Introduction to AQP • The three AQP system families • Comparison across families in terms of AQP tasks • Summary of what we learned

  10. AQP System Families • Plan-based AQP systems • AQP for traditional plan-based DBMSs • Continuous-Query-based (CQ-based) AQP systems • AQP for long-running continuous queries over data streams • Routing-based AQP systems • AQP for DBMSs and continuous queries based on adaptive tuple routing AQP Families

  11. Uses stats to cost plans + Extra operators Catalog (table sizes, histograms) Collected stats AQP in Plan-based Systems Query Optimizer: Chooses best plan Chosen plan Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats AQP Families

  12. Uses stats to cost plans + Re-optimize Extra operators Catalog (Original + observed stats) Collected stats AQP in Plan-based Systems Query Optimizer: Chooses best plan Chosen plan Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats AQP Families

  13. Example Plan-based AQP Systems STREAM Tukwila POP Re-Opt River Eddies Query Scrambling CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. AQP Families

  14. Primer on Continuous Query Processing Chosen packets • Continuous Queries (CQs) are long-running queries usually over data streams • Example CQ: Filtering packet streams • Stream properties or system conditions may change while query is running  best plan may change σ3 σ2 σ1 Packets AQP Families

  15. Uses stats to cost plans Chosen plan Catalog (table sizes, histograms) Executor: Runs chosen plan AQP in CQ-based Systems Query Optimizer: Chooses best plan Statistics Tracker: Creates/updates stats Runstats AQP Families

  16. Uses stats to cost plans Chosen plan Executor: Runs chosen plan AQP in CQ-based Systems Continuous Query Optimizer: Chooses best plan Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions AQP Families

  17. Chosen plan Executor: Runs chosen plan AQP in CQ-based Systems Continuous Query Optimizer: Ensures that plan is best for current stats Uses stats to cost plans Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions AQP Families

  18. Chosen plan Executor: Runs chosen plan Combined in-part for efficiency AQP in CQ-based Systems Continuous Query Optimizer: Ensures that plan is best for current stats Uses stats to cost plans Re-optimize Catalog (stream rates, data distr.) Stats to track Statistics Tracker: Monitors stream stats and system conditions AQP Families

  19. Example CQ-based AQP Systems STREAM Tukwila POP Re-Opt River Eddies Query Scrambling CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. AQP Families

  20. Chosen packets Chosen packets σ3 σ2 σ3 σ2 σ1 Tuple Router σ1 Packets Packets Using tuple routing Using a plan Primer on Routing-based Processing • Non-plan-based architecture where tuples are routed individually through operators • No optimizer • Exemplified by Eddies [AH00] AQP Families

  21. Uses stats to cost plans Chosen plan Catalog (table sizes, histograms) Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats AQP in Routing-based Systems Query Optimizer: Chooses best plan AQP Families

  22. Uses stats to choose efficient routes Selective routing of tuples Chosen plan In-memory catalog (operator costs, selectivities, etc.) Executor: Pool of operators Executor: Runs chosen plan AQP in Routing-based Systems Query or Continuous Query Tuple Router: Integrated Optimizer & Stats Tracker AQP Families

  23. Example Routing-based AQP Systems STREAM Tukwila POP Re-Opt River Query Scrambling Eddies CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. AQP Families

  24. Roadmap • Introduction to AQP • The three AQP system families • Comparison across families in terms of AQP tasks • Summary of what we learned

  25. Comparison Across AQP System Families • Goal: To bring out AQP algorithms and features, not performance numbers • Models, assumptions, and approach • Techniques for tracking statistics • Re-optimization subtasks • When and how to re-optimize • Switching between plans • Pros & cons of using a conventional optimizer • Performance issues • Quality of re-optimization • Run-time overhead & thrashing • Scalability Comparison

  26. Comparison Across AQP System Families • Goal: To bring out AQP algorithms and features, not performance numbers • Models, assumptions, and approach • Techniques for tracking statistics • Re-optimization subtasks • When and how to re-optimize • Switching between plans • Pros & cons of using a conventional optimizer • Performance issues • Quality of re-optimization • Run-time overhead & thrashing • Scalability Comparison

  27. Techniques for Tracking Statistics • Observation • Mostly in Plan-based systems • Competition • Mostly in Plan-based systems • Profiling • Mostly in CQ-based systems • Exploration • In Routing-based systems Comparison

  28. Chosen packets σ3 σ2 Selectivity of s1 on input stream can be observed here σ1 Packets Tracking Statistics: Observation [KD98] • Collect statistics on operator behavior or intermediate subexpressions in a plan Comparison

  29. Chosen packets σ3 σ2 Selectivity of s1 on input stream Selectivity of s2 on input stream σ1 σ2 Packets Tracking Statistics: Competition [A93] • Extra processing to collect statistics Comparison

  30. Tracking Statistics: Profiling [BMM+04] • Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics • Builds a “statistical profile” that can be used to estimate many individual statistics σ3 σ2 σ1 Profiled tuples Comparison

  31. Tracking Statistics: Exploration [AH00] Chosen packets • A fraction of tuples are routed along routes different from the current best route to track statistics along those routes • No redundant processing σ2 σ3 σ1 Tuple Router Packets Comparison

  32. Comparing Statistics-Tracking Techniques: Extra Overhead Introduced • Observation • Exploration (inefficient routes for some tuples) Increasing overhead • Profiling (extra processing on some tuples) • Competition (lots of extra work) Comparison

  33. Comparing Statistics-Tracking Techniques: Coverage of Different Statistics • Observation & Competition (limited by plan) Increasing coverage • Exploration (limited by large number of routes) • Profiling (highest since it builds statistics profile) Comparison

  34. Comparing Statistics-Tracking Techniques: Accuracy of Estimation Increasing accuracy • Exploration (but, susceptible to routing bias) • Profiling (depends on sampling fraction) • Observation & Competition Comparison

  35. Roadmap • Introduction to AQP • The three AQP system families • Comparison across families in terms of AQP tasks • Summary of what we learned

  36. s INLJ S Unclustered index R Example Query: sp1and p2 (R) S ⋈ What have we learned? (1) • Many similarities in internals of different AQP families • Can re-use many current (and new) AQP techniques across families • Ex: Profiling from CQ-based systems • Enables, e.g., faster detection of plan suboptimality in Plan-based systems • Generates more accurate statistics at lower cost in Routing-based systems New Ideas

  37. ⋈ Example Query: sp1and p2 (R) S ⋈ s s Hash Join INLJ S INLJ S Cost Unclustered index R Hash Join R |σ(R)| What have we learned? (2) • Current AQP systems are reactive • E.g., do not consider sensitivity to errors/changes in stats Proactive Re-optimization New Ideas

  38. What have we learned? (3) • Challenging meta problems in AQP for continuous queries need to be addressed • Larger and more complex plan spaces  higher costs for statistics tracking and re-optimization • Tracking “Return-of-Investment” on AQP • Avoiding thrashing, e.g., on bursty changes in statistics Proposal: Plan Logging for Continuous Queries New Ideas

  39. P1 P2 (R,S) Rate(R) Plan Logging for Continuous Queries • Log the statistics and re-optimization history • Query is long-running • Example view over log for R S T ⋈ ⋈ time Plans lying in a high-dimensional space of statistics New Ideas

  40. Summary • AQP is becoming important: • New data and application trends • CS-wide push towards Autonomic Computing • Significant amount of work on AQP in recent years • Our contributions: • In-depth categorization and comparison of AQP systems and techniques • Identified current shortcomings and new approaches to AQP Conclusions

More Related