420 likes | 524 Views
Multi-Query Optimization and Applications. Prasan Roy Indian Institute of Technology - Bombay. Motivation. Queries often involve repeated computation Queries on overlapping views, stored procedures, nested queries, etc. Update expressions for a set of overlapping materialized views
E N D
Multi-Query Optimization andApplications Prasan Roy Indian Institute of Technology - Bombay
Motivation • Queries often involve repeated computation • Queries on overlapping views, stored procedures, nested queries, etc. • Update expressions for a set of overlapping materialized views • Automatically generated queries • XML-QL complex path expressions SQL query batches • Our focus: Faster query processing by avoiding repeated computation Multi-Query Optimization and Applications
Outline • Multi-query optimization • Application to related problems • Query result caching • Materialized view selection and maintenance • Conclusions and future work Multi-Query Optimization and Applications
Multi-Query Optimization Prasan Roy, S. Seshadri, S. Sudarshan and Siddhesh Bhobe, Efficient and Extensible Algorithms for Multi-Query Optimization, ACM SIGMOD 2000
Motivating Example Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D 100 100 10 B A 100 100 10 C B C D 10 10 10 10 Total Cost = 460 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications
Motivating Example 100 100 10 10 BC 10 D A 10 10 100 C B 10 10 Total Cost = 370 Benefit = 90 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications
Problem Statement • Find the cheapest plan exploiting transiently materialized common subexpressions (CSEs) • Assumption: No shared pipelines D A C B Common Subexpression Multi-Query Optimization and Applications
Problems • Locally optimal subplans may not be globally optimal • Mutually exclusive alternatives (A JOIN B JOIN C) (B JOIN C JOIN D) (C JOIN D JOIN E) What to share: (B JOIN C) or (C JOIN D) ? • Materializing and sharing a CSE not necessarily cheaper Multi-Query Optimization and Applications
Example Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D 10 100 10 B A 1 10 10 C B C D 10 1 1 1 Total Cost = 154 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications
Example 100 10 10 10 BC 10 D A 1 10 10 C B 10 1 Total Cost = 172 Benefit = -18 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications
Approach • Set up the search space of execution plans • Explore the search space to find the best execution plan Multi-Query Optimization and Applications
Representation of Plan Space • AND/OR Query DAG BCD ABC Equivalence Class (OR node) Operation (AND node) BC CD AB Example Plan (Solution Graph) C D A B Multi-Query Optimization and Applications
DAG Generation ModificationsUnification • Volcano: Duplicate subexpressions No CSEs! ABC BCD BC BC CD AB C A B C D B • Modification: Duplicate subexpressions unified Multi-Query Optimization and Applications
DAG Generation ModificationsSubsumption • Volcano: No expression subsumption Missed CSEs (A>50) (A<10) (A>50) Subsumption derivation (A<10) (A>50) (A>50) (A<10 or A>50) (A>10) • Modification: Subsumption derivations introduced Multi-Query Optimization and Applications
Exploring the Search SpaceAn Exhaustive Algorithm Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • Y = set of equivalence nodes in DAG • Pick X Y which minimizes BestCost(Q, X) • Return X BestCost(Q, X) = cost of the best plan for Q given that the nodes in X are transiently materialized Too expensive! Need heuristics. Multi-Query Optimization and Applications
Exploring the Search SpaceA Greedy Heuristic Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X Benefit(z | Q, X) = BestCost(Q, X) - BestCost(Q, X U {z}) Appeared in [Gupta, ICDT97]. Our Contribution: improve efficiency Multi-Query Optimization and Applications
Improving EfficiencySummary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X • Restrict the set of materialization candidates • Compute Benefit efficiently • Heuristically avoid computing Benefit for some nodes Multi-Query Optimization and Applications
Improving EfficiencyOnly CSEs Materialized • CSEs identified in a bottom-up traversal BCD ABC Common Subexpression BC CD AB C D A B Multi-Query Optimization and Applications
Improving EfficiencySummary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X • Restrict the set of materialization candidates • Compute Benefit efficiently • Heuristically avoid computing Benefit for some nodes Multi-Query Optimization and Applications
Efficient Benefit Computation Incremental Re-optimization X : Set of CSEs already materialized z : unmaterialized CSE Best plan given X materialized Best plan given X U {z} materialized • Observation Best plans change only for the ancestors of z Multi-Query Optimization and Applications
Incremental Re-optimization Example X = {} BCD ABC 230 230 230 120 120 230 z = (B JOIN C) 100 100 100 100 10 10 BC BC 10 130 CD AB 100 100 100 Best Plan C D A B 10 10 10 10 Multi-Query Optimization and Applications
Incremental Re-optimization Efficient Propagation • Ancestor nodes visited bottom-up in a topological order • Guarantees no revisits • Propagation path pruned if the current node’s best cost remains unchanged Multi-Query Optimization and Applications
Improving EfficiencySummary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X • Restrict the set of materialization candidates • Compute Benefit efficiently • Heuristically avoid computing Benefit for some nodes Multi-Query Optimization and Applications
Avoiding Benefit Computation • Monotonicity Assumption • Benefit of a node does not increase due to materialization of other nodes • Often true • An earlier benefit of a node is an upper bound on its current benefit • Do not recompute a node’s benefit if another node’s current benefit is greater Optimization costs decrease by 90% Multi-Query Optimization and Applications
Experimental Results • TPCD-0.1 on Microsoft SQL Server 6.5 • using SQL rewriting for MQO Multi-Query Optimization and Applications
Alternatives to Greedy Volcano-SH • A lightweight post-pass heuristic • Compute the best plan for each query independently, using Volcano • Find the set of nodes in the best plans to materialize (cost-based) • Similar previous work [Subramanium and Venkataraman, SIGMOD 1998] Multi-Query Optimization and Applications
Alternatives to Greedy Volcano-RU • A lightweight extension of Volcano • Batched queries optimized in sequence Q1, Q2, …, Qn • Find the best plan for query Qi given the best plans for queries Qj, j < i • Cost based materialization of nodes in best plans of Qj, j < i • Plan quality sensitive to the query sequence Multi-Query Optimization and Applications
Experimental Results • TPCD-0.1 query batches Multi-Query Optimization and Applications
Experimental Results • TPCD-0.1 query batches Multi-Query Optimization and Applications
Features • Easily implemented • First MQO implementation integrated with a state-of-the-art optimizer (as far as we know) • Also partially prototyped on Microsoft SQL-Server • Support for index selection • Index modeled as physical property (like “interesting order”) • Extensible and flexible • New operators, data models • Readily adapts to other problems • Query result caching • Materialized view selection/maintenance Multi-Query Optimization and Applications
Query Result Caching P. Roy, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan, Don’t Trash Your Intermediate Results, Cache ‘em, Submitted for publication
Problem Statement • Minimize the total execution time of an online workload by • Caching intermediate/final results of individual queries, and • Using these cached results to answer later queries Multi-Query Optimization and Applications
System Model Multi-Query Optimization and Applications
Contributions • Intermediate as well as final results cached • Optimizer-driven cache management • Adapts to workload changes • Cache-aware cost-based optimization • Novel framework for cached result matching Multi-Query Optimization and Applications
Experimental Results • Overheads negligible • Performance on 900 query TPCD-1 based uniform cube-point workload Multi-Query Optimization and Applications
Materialized View Selection and Maintenance Hoshi Mistry, Prasan Roy, K. Ramamritham and S. Sudarshan, Materialized View Selection and Maintenance Using Multi-Query Optimization, Submitted for publication
Problem Statement • Speed up maintenance of a set of materialized views by • Exploiting CSEs between different view maintenance expressions • Selecting additional views to be materialized Multi-Query Optimization and Applications
Contributions • Optimization of maintenance expressions • Support for transiently materialized “delta’’ views • Nicely integrates transient vs permanent view materialization choices Multi-Query Optimization and Applications
Experimental Results • Overheads negligible • Performance benefit for maintenance of two TPCD-0.1 based SPJA views Multi-Query Optimization and Applications
Conclusion • MQO is practical • Low overheads, high benefits • Easily implemented and integrated • Leads to novel solutions to related problems • Query result caching • Materialized view selection and maintenance Multi-Query Optimization and Applications
Future Work • Further extensions of MQO • Shared execution pipelines • Query result caching in presence of updates • Other problems • Continuous queries, XML view caching, etc. Multi-Query Optimization and Applications
Other Contributions • Garbage Collection in Object Oriented Databases • Developed a “transaction-aware” cyclic reference counting algorithm • Provided a formal proof of correctness S. Ashwin, Prasan Roy, S. Seshadri, Avi Silberschatz and S. Sudarshan, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, VLDB 1997 Prasan Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan and S. Ashwin, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, Invited Paper, VLDB Journal, August 1998 Multi-Query Optimization and Applications