1 / 50

Online Analytical Processing (OLAP)

Understand OLAP, its data model, motivation, research areas, optimizations, and materialization strategies. Learn to express N-dimensional problems in 2D tables and enhance SQL queries with OLAP functionalities.

philliph
Download Presentation

Online Analytical Processing (OLAP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3rd 2006

  2. Overview • Motivation • Multi-Dimensional Data Model • Research Areas • Optimizations • Materializing multiple aggregates simultaneously • Materialization strategy

  3. Motivation • Aggregation, summarization and exploration • Of historical data • To help management make informed decisions

  4. Different Goal • Aggregation, summarization and exploration • Of historical data • To help management make informed decisions • Find the total sales for each product and month • Find the percentage change in the total monthly sales for each product

  5. Different Requirements • OLTP – On-Line Transaction Processing • OLAP – On-Line Analytical Processing

  6. Overview • Motivation • Multi-Dimensional Data Model • Research Areas • Optimizations • Materializing multiple aggregates simultaneously • Materialization strategy

  7. Query Language Extensions • In the real world, data is stored in RDBs.

  8. Query Language Extensions • In the real world, data is stored in RDBs. • How to express N-dimensional problems using 2D tables?

  9. Query Language Extensions • In the real world, data is stored in RDBs. • How to express N-dimensional problems using 2D tables? • Can we combine OLAP and SQL queries? • Jim Gray et al:Data Cube: A Relational Aggregation Operator1997

  10. Query Language Extensions • histograms Problems with GROUP BY SELECT sales, prod_name, population FROM sales_history GROUP BY Population(City, State) as population

  11. Query Language Extensions • histograms • rollup/drilldown Problems with GROUP BY non relational representation

  12. Query Language Extensions • histograms • rollup/drilldown Problems with GROUP BY relational, but the rollup is huge

  13. Query Language Extensions • histograms • rollup/drilldown • cross tabulations Problems with GROUP BY 2-D aggregation is more compact and more natural:

  14. Query Language Extensions Reducing the number of attributes

  15. Query Language Extensions • introduce a new value: “ALL” Reducing the number of attributes “ALL” = the set over which we aggregate

  16. Query Language Extensions • GROUP BY (1D) General approach

  17. Query Language Extensions • GROUP BY (1D) • Cross Tab (2D) General approach the corresponding relation:

  18. Query Language Extensions • GROUP BY (1D) • Cross Tab (2D) • Cube (3D) General approach By cat. andname (does it make sense?) By cat.andmonth By monthand name

  19. Query Language Extensions • GROUP BY (1D) • Cross Tab (2D) • Cube (3D) • Any hypercube can be represented as a relation! General approach

  20. Query Language Extensions • a CUBE relation, with aggregation function f(.) (x1, x2, …, xn-1, xn, f() ) …………………………… (x1, xn-1, …, xn, ALL, f() ) …………………………… (x1, x2, …, ALL, xn, f() ) …………………………… • after ROLLUP , reduce to a linear # of tuples (x1, x2, …, xn-1, xn, f() ) ………………………………… (x1, xn-1, …, xn, ALL, f() ) ………………………………… (x1, x2, …, ALL, ALL, f() ) ………………………………… (ALL, ALL, …, ALL, ALL, f() ) General approach

  21. Query Language Extensions The new operators: CUBE, ROLLUP SELECT prod_category, prod_name, month, SUM(sales) AS sales FROM sales_history GROUP BY CUBE prod_category, prod_name, month Idea: Group by the CUBE list. Union the aggregates. Introduce the ALL values.

  22. Query Language Extensions The new operators: CUBE, ROLLUP SELECT prod_category, month, day, state, prod_name, SUM(sales) AS sales FROM sales_history GROUP BY prod_category ROLLUP month, day CUBE city, state

  23. Overview • Motivation • Multi-Dimensional Data Model • Research Areas • Optimizations • Materializing multiple aggregates simultaneously • Materialization strategy

  24. Research Areas • SQL language extensions • Server architecture • Parallel processing • Index structures • Materialized views

  25. Overview • Motivation • Multi-Dimensional Data Model • Research Areas • Optimizations • Materializing multiple aggregates simultaneously • Materialization strategy

  26. Simultaneous Multi-Dimensional • Optimization to calculate multiple aggregates simultaneously • Useful for materialization of aggregate views Aggregates • Y. Zhao, P. Deshpande, J. NaughtonAn Array-Based Algorithm for Simultaneous Multidimensional AggregatesSIGMOD 1997

  27. Multiple Aggregates Aggregate on…

  28. Multiple Aggregates Aggregate on…

  29. Multiple Aggregates Aggregate on… • Sales by Product / City • Sales by Product / Month • Sales by Month / City • Sales by Product • Sales by City • Sales by Month • Sales (Total) • Is it possible to • make a single pass over the transactional table? • calculate multiple aggregates simultaneously?

  30. Chunking 64 Partition transactional data into array chunks 14 15 16 13 12 9 10 11 42 Dimension B City 36 5 8 7 6 Array Chunk 20 1 2 3 4 1 Dimension C 12 Month Dimension A Product

  31. Naïve Algorithm 64 14 15 16 13 Dimension A 12 9 10 11 42 Dimension B 36 5 8 7 6 20 4 1 2 3 Dimension C Pivot on ABaggregate on all C Dimension A

  32. Naïve Algorithm 64 13 14 15 16 9 12 10 11 42 Dimension B 36 5 8 7 6 20 1 2 3 4 Dimension C Pivot on ABaggregate on all C Dimension A Pivot on ACaggregate on all B Pivot on BCaggregate on all A

  33. Single Pass Algorithm 64 AB 14 15 16 13 AC 12 9 10 11 42 B 36 5 8 7 6 20 1 2 3 4 Dimension C BC Dimension A Make a single pass over data

  34. Single Pass Algorithm 64 AB 14 15 16 13 AC 12 9 10 11 42 B 36 5 8 7 6 20 1 2 3 4 Dimension C BC Dimension A Simultaneously maintain multiple aggregates

  35. Single Pass Algorithm 64 AB 14 15 16 13 AC 12 9 10 11 42 B 36 5 8 7 6 20 1 2 3 4 Dimension C BC Dimension A Write out completed aggregates

  36. Single Pass Algorithm 64 AB 14 15 16 13 AC 12 9 10 11 42 B 36 5 8 7 6 20 1 2 3 4 Dimension C BC Dimension A Only allocate memory that is necessary

  37. Single Pass Algorithm AB Array Chunk ABC 4 x 4 x 4 AC AB 16 x 4 x 4 BC 4 x 4 AC 4 x 4 x 4 B 4 C 4 A 4 x 4 BC Minimummemory spanning tree all 1

  38. Multi Pass Algorithm Recursively aggregate ABCD ABC ACD ABD BCD AB AC BC BD CD AD D A C B all

  39. Overview • Motivation • Multi-Dimensional Data Model • Research Areas • Optimizations • Materializing multiple aggregates simultaneously • Materialization strategy

  40. Implementing Data Cubes • Biggest problem for data warehouses: the size • Space / time trade-off:accelerate queries by materializing the cube

  41. Implementing Data Cubes • Biggest problem for data warehouses: the size • Space / time trade-off:accelerate queries by materializing the cube • The size of the relations gets even bigger!

  42. Implementing Data Cubes • Biggest problem for data warehouses: the size • Space / time trade-off:accelerate queries by materializing the cube • The size of the relations gets even bigger! • M(ultidimensional)OLAP: good query performance, but bad scalability • R(elational)OLAP: very scalable; query performance improved by materializing (partial) results

  43. Implementing Data Cubes • V. Harinarayan, A. Rajaraman, J.D. Ullman:Implementing Data Cubes EfficientlySIGMOD 1996Presents a materialization strategy for the cells of the cube.

  44. Implementing Data Cubes

  45. Implementing Data Cubes • casted as particular case of the rewriting using views problem • what cells to materialize  what SQL views to materialize

  46. Implementing Data Cubes ptc pc pt tc t p c none • casted as particular case of the rewriting using views problem • what cells to materialize  what SQL views to materialize p = product t = time c = city • simple idea: Q1 depends on Q2(Q1≤Q2) if Q1 can be fully answered using the results of Q2

  47. Implementing Data Cubes city product_name day state product_category week month none year none none ptc pt tc pc … pwc pmc pts pcatt … … … ps pyc … • but cube dimensions are usually hierarchical X X p = product t = time c = city • direct-product lattice

  48. Implementing Data Cubes • Def. cost of answering Q = # of rows in the table of ancestor(Q) • It can be estimated w/o materializing the views • Assume that all queries are identical to some view in the lattice

  49. Implementing Data Cubes • For a set S and a view vB(v,S) = ∑w≤v, (w not in S) max{cost(w)-cost(v), 0} • Greedy algorithm for selecting k views to materialize from the lattice: • S := {top view} • For i=1 to k, add v to S s.t. B(v,S) is maximized • The greedy algorithm is an (e-1)/e ≈ 0.63 approx. of the optimum.

  50. Discussion • Questions from the audience…

More Related