280 likes | 386 Views
Incremental Maintenance for Non-Distributive Aggregate Functions. Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh. work done at IBM Almaden Research Center. Motivation. large amounts of data stored in databases often times data warehouses are used
E N D
Incremental Maintenance for Non-Distributive Aggregate Functions Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh work done at IBM Almaden Research Center
Motivation • large amounts of data stored in databases • often times data warehouses are used • consolidate data from many sources • offer more general and descriptive view of data • queried by business intelligence tools and decision support systems • produce expensive OLAP queries • these OLAP queries have nice properties: • based on same set of tables • perform similar aggregations Themis Palpanas - U of Toronto
Motivation (cont’d) • can efficiently support such queries with Automatic Summary Tables (ASTs) • materialized queries defined over a set of base tables • precomputed once, used many times • answer complex queries fast • must maintain ASTs when base tables change • inserts, updates, deletes Themis Palpanas - U of Toronto
base tables insert/update/delete Motivation (cont’d) AST AST definition
Aggregate Functions • characterization of functions wrt insertion and deletion operations • updates are series of deletions and insertions • distributive aggregate functions • new value computed based on old value and value of operation • SUM() • non-distributive aggregate functions • above property does not hold • STDDEV() • MIN() (because of deletions) Themis Palpanas - U of Toronto
Problem Statement • given ASTs with aggregate functions • distributive • SUM, COUNT • non-distributive • STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, … • when base tables change • incrementally maintain affected ASTs efficient maintenance of ASTs with non-distributive aggregate functions Themis Palpanas - U of Toronto
Outline • Current Approach • Our Solution • Experimental Evaluation • Related Work • Conclusions Themis Palpanas - U of Toronto
Propagate phase Apply phase combine old and new values base tables insert/update/delete Current Approach AST delta AST definition
Current Approach (cont’d) • works for distributive • SUM, COUNT • does not work for non-distributive • STDDEV, CORRELATION, REGRESSION • MIN/MAX • XMLAGG • need new way to deal with these functions Themis Palpanas - U of Toronto
Our Solution • selective recomputation • no longer enough to compute delta • must recompute some aggregation groups • minimize work to be done • choose which groups to recompute • optimize query plan Themis Palpanas - U of Toronto
Propagate phase Apply phase base tables insert/update/delete Our Solution (cont’d) combine old and new values recompute affected groups AST delta AST definition
Our Solution (cont’d) • the 5 steps • compute new aggregate values • change column derivation • recompute only affected groups • eliminate unnecessary operations • optimize for special cases Themis Palpanas - U of Toronto
UDI LOJ AST prop Initial Query Plan • Query Graph Model (QGM) Themis Palpanas - U of Toronto
UDI LOJ AST LOJ prop AST 1. Compute New Aggregate Values • compute delta for distributive functions • recompute non-distributive functions • get those values only for affected groups • duplicate computation for distributive functions! Themis Palpanas - U of Toronto
2. Change Column Derivation UDI • change column derivation • rewrite phase projects out unused columns • entire AST gets recomputed! LOJ AST LOJ non-distributive only prop AST distributive only Themis Palpanas - U of Toronto
2. Change Column Derivation • example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id • result of COUNT() computed from old propagate phase • results of MAX() and STDDEV() from AST definition Themis Palpanas - U of Toronto
3. Recompute Affected Groups • push join predicate down in AST • only affected groups are recomputed • special rules for super-aggregates • GROUPING SETS • ROLLUP • CUBE UDI non-distributive only LOJ AST LOJ distributive only AST* J J prop … T1 Tk Themis Palpanas - U of Toronto
3. Recompute Affected Groups • special treatment for ASTs with super-aggregates • predicates not pushdownable • caution not to compute totals of totals • build special join predicate • ensure correct aggregations • change rewrite rules • allow predicate pushdown through super aggregates • applicable only for special join predicate Themis Palpanas - U of Toronto
4. Remove Unnecessary Operations • outerjoin not always needed • when changes are only inserts • reroute columns from propagate phase through AST • remove outerjoin operator • same for updates not referencing AST grouping columns and predicates UDI LOJ all columns AST distributive only AST J J prop … T1 Tk Themis Palpanas - U of Toronto
4. Remove Unnecessary Operations • example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id • modification on base tables: UPDATE employees SET salary=10 WHERE age>40 • outerjoin operation will not be built • update does not refer to grouping column (dept_id), and no predicate in AST refers to updated column (salary) • certain that no tuples in AST will be deleted • only STDDEV() will be recomputed • the rest are not affected by changes Themis Palpanas - U of Toronto
5. Optimize for Special Cases • recomputation step not needed when • only insertions and only MIN/MAX functions • build predicate in apply phase • check if new min/max should replace old values • only deletions referring only to grouping columns of AST • can only cause entire groups to be deleted • handled in apply phase Themis Palpanas - U of Toronto
5. Optimize for Special Cases • example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id • modification on base tables: DELETE FROM employees WHERE dept_id>40 • selective recomputation step not needed • deletion refers only to grouping column (dept_id) • certain that entire groups will be deleted from AST • no other groups will be affected Themis Palpanas - U of Toronto
Experimental Evaluation • prototype implementation in IBM DB2 UDB • star schema database • sales of products over 5 year time period • fact table: 10 million tuples • AST with non-distributive aggregate function • 240,000 tuples • workload simulates nightly updates • add/delete data for first day of month • add/delete data for second day of month • add/delete data for full month Themis Palpanas - U of Toronto
Experimental Evaluation (cont’d) • deletions require 40-60% of full refresh time • optimized deletions require 1-4% of full refresh time Themis Palpanas - U of Toronto
Experimental Evaluation (cont’d) • insertions/updates require 20-25% of full refresh time Themis Palpanas - U of Toronto
Related Work • incremental view maintenance • differential refresh algorithms • Lindsay et al. 1986, Blakeley et al. 1986, Qian and Wiederhold 1991, Ceri and Widom 1991 • deferred incremental maintenance • Colby et al. 1996, Salem et al. 2000 • views with aggregation • Quass 1996, Mumick et al. 1997 Themis Palpanas - U of Toronto
Conclusions • incremental maintenance for ASTs with non-distributive aggregate functions • support MIN/MAX, STDDEV, CORRELATION, REGRESSION, XMLAGG, … • efficient selective recomputation • recompute only affected groups • optimize query plan • customize for special cases • significant performance improvements Themis Palpanas - U of Toronto
Future Work • examine use of work areas • temporary storage space • store intermediate values • maintenance without recomputation • STDDEV, MIN/MAX(?), … • very helpful for ASTs defined with super-aggregates • ASTs with HAVING clauses • do not know when groups will enter/leave AST Themis Palpanas - U of Toronto