320 likes | 424 Views
Achieving Scalability in OLAP Materialized View Selection. Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002. Topics. Overview of OLAP Exponentiality in View Selection Our Polynomial Greedy Algorithm (PGA) Test Results Conclusions Current Work. Customer. CustID. Name.
E N D
Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002
Topics • Overview of OLAP • Exponentiality in View Selection • Our Polynomial Greedy Algorithm (PGA) • Test Results • Conclusions • Current Work
Customer CustID Name Fact Table City Calendar CustID State/Prov DateID DateID Bind Style BindID Month Cost BindID Quarter Sell Desc Year Example Star Schema
CustID Name City State/Prov DateID Month Quarter Year 1/1/98 Jan 1 1998 00001 U of M Ann Arbor MI 1/2/98 Jan 1 1998 00002 Smith & Co. Toronto Ont 12/31/00 Dec 4 2000 CustID DateID BindID Cost Sell 00002 12/31/00 PB $500 $600 BindID Desc 00222 1/1/99 HC $1100 $1300 PB Paper Back Many Rows HC Hard Cover Star Schema Viewed with Data Customer Calendar Fact Table Bind Style
Combinatorial Explosion d i = 1 • Possible views = ℓi, where d = |dimensions| ℓi = |levels| in dimension i • Book database example • 2 dimensions, 42 = 16 views • 4 dimensions, 44 = 256 views • 6 dimensions, 46 = 4,096 views • 8 dimensions, 48 = 65,536 views
Recap • Materialized views quicken query responses • Disk space limits view materialization • Update window is a constraint • Solution: Select strategic views
Fact Table View Size Estimation Sample Data Estimate Request Estimated View Size Completed Work Initial Data View Selection Strategic Views View Maintenance Incremental Data Update Current Work Current Views Queries Users Query Optimization Quick Responses Our OLAP Optimization Approach
{c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 View Selection:Example of Hypercube Lattice [HRU96] p = Part s = Supplier c = Customer
{c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Iteration 1 {p, s} {c, s} {c, p} {s} {p} {c} {} 5.2M x 4 = 20.8M 0 x 4 = 0 0 x 4 = 0 5.99M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 Example of HRU Algorithm [HRU96] p = Part s = Supplier c = Customer Benefits of Possible Materialization Choices
{c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Iteration 1 {p, s} {c, s} {c, p} {s} {p} {c} {} 5.2M x 4 = 20.8M 0 x 4 = 0 0 x 4 = 0 5.99M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 Example of HRU p = Part s = Supplier c = Customer Benefits of Possible Materialization Choices Iteration 2 0 x 4 = 0 0 x 4 = 0 0.79M x 2 = 1.58M 0.6M x 2 = 1.2M 5.9M x 2 = 11.8M 0.8M - 1
Exponentiality in HRU • O(kn2) time, where k = |views to select|, n = |possible views| • n = 2d in non-hierarchical database, where d = |dimensions| • HRU algorithm is O(k22d) time • Two sources of exponentiality • Each possible view is evaluated • Each view evaluation considers the effect of materialization on every descendent
Polynomial Greedy Algorithm (PGA) Nomination Selection For each candidate Select fact table Evaluate benefit Start new path [more candidates] [path ended] [else] Select view greedily [continuing path] Nominate smallest child view [else] [termination condition met]
{c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Example of PGA [NT02] p = Part s = Supplier c = Customer
{c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Candidates {p, s} {s} {} Example of PGA p = Part s = Supplier c = Customer Nomination
{c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Candidates Iteration 1 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Example of PGA p = Part s = Supplier c = Customer Nomination Selection
Candidates Iteration 1 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Example of PGA {c, p, s} 6M p = Part s = Supplier c = Customer {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Nomination Selection Nomination Candidates {c, s} {s} {c} {}
{c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Candidates Iteration 1 Candidates Iteration 2 {c, s} {s} {c} {} 0 x 2 = 0 0.79M x 2 = 1.58M 5.9M x 2 = 11.8M 6M - 1 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Example of PGA p = Part s = Supplier c = Customer Nomination Selection Nomination Selection
Nomination Complexity • Maximum swatch width is d. • Maximum path length is d. • Finding one path is O(d2) time • Our strategy nominates a path each time a view is selected, complexity is O(d2k) time
Evaluating Views in PGA • Polynomial time evaluation requires approximating materialization benefits • Account for smallest ancestor • Account for materialized view with largest overlap in descendants • Complexity of our algorithm is O(d2k2)
Near Optimal Selection Query Costs (rows) d=2, ℓ = 4 Materialization Costs (rows)
HRU PGA Query Costs at Four Dimensions Query Costs (thousands of rows) Materialization Costs (thousands of rows)
HRU PGA Query Costs at Six Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows)
HRU PGA Query Costs at Eight Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows)
HRU PGA Performance at Four Dimensions Processing Time (seconds) Materialization Costs (thousands of rows)
HRU PGA Performance at Six Dimensions Processing Time (minutes) Materialization Costs (thousands of rows)
HRU PGA Performance at Eight Dimensions Processing Time (minutes) Materialization Costs (thousands of rows)
Conclusions • PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity • PGA extends the usefulness of OLAP systems into higher dimensionality
Fact Table View Size Estimation Sample Data Estimate Request Estimated View Size Completed Work Initial Data View Selection Strategic Views View Maintenance Incremental Data Update Current Work Current Views Queries Users Query Optimization Quick Responses Current Work
Current Work • Design alternative data structures for materialized views in OLAP • Test impact of new data structures on update and query costs. • Integrate our work into an OLAP system
References • [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal, Canada. • [NT01] T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada. • [NT02] T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version). http://www.eecs.umich.edu/~teorey/cv.html.