350 likes | 538 Views
Materialized View Selection and Maintenance using Multi-Query Optimization. Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham. Materialized Views. Complex results materialized in order to speed up queries that depend on these results
E N D
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham
Materialized Views Complex results materialized in order to speed up queries that depend on these results • Increasingly being supported by commercial database systems (e.g. Oracle8i) • Crucial in data warehousing environments
Materialized View Maintenance As underlying data changes, the materialized views need to be refreshed Efficient view maintenance crucial! • Need to provide up-to-date query responses growing • Amount of data added to data warehouses increasing • Maintenance time window shrinking
Focus Efficient techniques for maintenance of a set of materialized views (MVs) by • Transient materialization of common subexpressions (CSEs) • Selection of additional MVs • Computation of the best maintenance policy and plan for each MV
Transient Materialization of Common Subexpressions CSEs materialized to reduce maintenance cost by sharing computation, disposed after use • Motivated by Blakeley et al. [SIGMOD86], Ross et al. [SIGMOD96] • Huge search space; considered impractical • Earlier work by Sellis [TODS88] • Efficient heuristic algorithms proposed by Roy et al. [SIGMOD00]
Selection of Additional MVs Additional views materialized permanently to reduce the overall maintenance cost • Motivated by Ross et al. [SIGMOD96] • restricted to incremental maintenance only • do not consider transient materialization • MV selection in general addressed in Roussopolous [TODS82], Agrawal et al. [VLDB00]
Best Maintenance Policy and Plan Computation For each MV, • Determine the best maintenance policy (incremental or recomputation) • Find the corresponding best plan • Earlier work by Vista [EDBT98] • Does not take into account transient materialization of CSEs or presence of other MVs • Current systems need manual specification of the maintenance policy
Contribution A framework that consolidates the choice of • CSEs to be transiently materialized • Additional MVs • Best maintenance plan (incremental/recomputation) Integrated with a state of the art query optimizer (Volcano [ICDE93])
Example initial set incremental refresh recomputation recomputation CDE BCDE ABC permanent permanent permanent merge incremental refresh DE permanent BC merge transient dA B C D dE
Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection
Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection
Setting Up the Maintenance Plan Space • The Query DAG representation for recomputation plans • Incorporating incremental plans
Representation of the Recomputation Plan Space AND/OR Query DAG BCD ABC Equivalence Class (OR node) Operation (AND node) BC CD AB Best Plan C D A B Additionally incorporates subsumption derivations Details in Roy et al. [SIGMOD00]
Incremental Plans:Propagation Based Differential Generation • Differentials propagated one at a time • For each differential dR • Start at dR and compute node differentials bottom-up along the “best plan” in a topological order • Differential of a node computed as a function of its inputs and their differentials • e.g. d(E1E2) = E1 dE2 U E2dE1 U dE1dE2 where dEi= differential of Ei wrt dR • Refresh the relation R and the affected MVs wrt dR by merging with the differentials computed as above Ross et al. [SIGMOD96]
Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dA BCdA Equivalence Class (OR node) Operation (AND node) BC BdA Best Plan C dA B
Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dB CDdB ACdB Equivalence Class (OR node) Operation (AND node) CdB CD AdB Best Plan C D A dB
Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dC BDdC ABdC Equivalence Class (OR node) Operation (AND node) BdC DdC AB Best Plan dC D A B
Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dD BCdD Equivalence Class (OR node) Operation (AND node) BC CdD Best Plan C dD B
Incorporating Incremental Plans Merge operator AB Logical representation incremental plan recomputation plan BdA AdB dA B A dB • For each equiv node and each base differential affecting it • Introduce a new equiv node representing its differential • Populate with the differential plans • Maintain statistics for the full expression after successive merges Large space overhead!
Incorporating Incremental Plans Actual space-efficient representation AB BdA AdB dA B A dB Reuse the same structure for successive propagation cycles • separate best plan pointers for each cycle • separate statistics for the full expression after successive merges Also incorporates sort-orders, indices, etc. Roy et al. [SIGMOD00]
Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection
Maintenance Plan Computation Given • Set of nodes Mt materialized transiently • can include full results as well as differentials • Set of nodes Mp materialized permanently • includes full results but not differentials compute the best consolidated maintenance plan for Mp
Maintenance Plan Computation Best plan computed using a query optimizer extended as follows: • Plan accessing a materialized view (trans/perm) does not include its computation, only its use • Cost of a maintenance plan totalcost(Mp, Mt) = eMpmaintcost(e | Mp, Mt) + eMttrmatcost(e | Mp, Mt) where maintcost(Mp, Mt): cost of cheapest maintenance plan for e (recomputation/incremental) trmatcost(Mp, Mt): cost of computing and materializing e
Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection
Transient/Permanent Materialized View Selection Given set of MVs M already materialized, determine • Set of nodes Mt to materialize transiently • Set of nodes Mp ( M) to materialize permanently such that totalcost(Mp, Mt) is minimized Exhaustive approach too expensive. Need heuristics!
Transient/Permanent Materialized View SelectionA Greedy Heuristic Input: Initial MVs M Output: Mp ( M) , Mt, corresp. best plan Begin Mp = M; Mt = {} S = set of equivalence nodes in the DAG for M While ( S {} ) Pick z S which maximizes Benefit(z | Mp, Mt) If ( Benefit(z | Mp, Mt) 0 ) break If ( z is a full result and maintcost(z | Mp, Mt) < trmatcost(z | Mp, Mt) ) Mp =Mp U {z} else Mt = Mt U {z} S = S – {z} Return (Mp, Mt) End How to compute Benefit(z | Mp, Mt)?
Transient/Permanent Materialized View SelectionBenefit Computation Benefit(z | Mp, Mt) = gain(z | Mp, Mt) - investment(z | Mp, Mt) where gain(z | Mp, Mt) = eMp(maintcost(e | Mp, Mt) - maintcost(e | Mp, Mt U {z})) + eMt(trmatcost(e | Mp, Mt) - trmatcost(e | Mp, Mt U {z})) and investment(z | Mp, Mt) = min(maintcost(z | Mp, Mt), trmatcost(z | Mp, Mt)) if z is a full result trmatcost(z | Mp, Mt) if z is a differential Benefit computation expensive. Need efficient techniques!
Transient/Permanent Materialized View SelectionImproving Efficiency of the Greedy Heuristic • Cost-propagation based incremental techniques to efficiently compute Benefit • Monotonicity assumption • Reduces the number of Benefit computations • Techniques to determine if a node can be shared across a given maintenance plan • Reduces the number of nodes considered for transient materialization Adapted from Roy et al. [SIGMOD00]. See paper for details.
Benchmark • Single Views • Same views as above, refreshed separately • Set of Views • 10 views (5 with aggregates, 5 without) on 8 distinct relations, refreshed together
Effect of Transient and Permanent Materialization Single Views Set of Views
Effect of Adaptive Maintenance Policy Selection Single Views Set of Views
Scalability Analysis Optimization Memory Requirements Optimization Time Negligible one-time costs
Conclusion Presented techniques • Automate sharing of computation • Automate view selection • Automate maintenance policy selection and plan computation • Do the above in an integrated manner • leading to benefits greater than could be achieved by considering each dimension individually • Are efficient and scalable • the overall benefits greatly outweigh the one-time cost • Integrate with state-of-the-art optimizers (e.g. MS SQL-Server)
Future Work Extend presented techniques • To handle limited space • To speed up a workload of queries in addition to maintenance of a set of materialized views • To work in dynamic query result caching environments