130 likes | 286 Views
20.5 Data Cubes. Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222. Outline. Introduction The Cube Operator Aggregation Cube implementation by Materialized Views Materialized Views Queries using materialized view The Lattice of Views Lattice of a materialized view. Introduction.
E N D
20.5Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222
Outline • Introduction • The Cube Operator • Aggregation • Cube implementation by Materialized Views • Materialized Views • Queries using materialized view • The Lattice of Views • Lattice of a materialized view
Introduction • A formal data cube precomputes all possible aggregates in a systematic way. • Amount of extra storage required is tolerable. • If the warehoused data does not change, there is no penalty incurred to keep the aggregates up-to-date.
The Cube Operator • Given : a fact table F, an augmented table would be CUBE(F) and it adds a new value ‘*’, to each dimension. • * means any. It aggregates dimension in which it appears. • Border represents only a small addition to the volume of the CUBE. • A tuple of CUBE(F) that has ‘*’ in one or more dimension will have sum of values of that dependent attribute in all tuples in that dimension.
Aggregation Example of aggregation : Given fact table: • Sales (serialNo, date, dealer, price) • serialNo is a key, so aggregating over all dates, dealers or prices will still give just one result with that serialNo • Replace serialNo by two attributes : model and color (both non-key) • It could give total sales price for • a given model • a given color • a given dealer • a given date • To get average sales, add a new dimension cnt and replace price with val i.e. sum of prices over a category • Relation Sales now becomes: Sales (model, color, date, dealer, val, cnt)
Example of a tuple in both Sales and CUBE(Sales) would be: (‘Gobi’, ‘red’, ‘2001-05-21’, ‘Friendly-Fred’, 45000, 2) • Aggregating over all colors: (‘Gobi’, *, ‘2001-05-21’, ‘Friendly-Fred’, 152000, 7) • Aggregating over all colors and dealers: (‘Gobi’, *, ‘2001-05-21’, *, 2348000, 100) • Aggregating over all colors, times and dealers: (‘Gobi’, *, *, *, 1339800000, 58000) • Aggregating over all colors, times, dealers and models: (*, *, *, *, 3521727000, 198000) Aggregating over all dimensions tells us that the total sales of all models in all colors, over all time at all dealers is 198000 cars for a total price of $3521727000
Query and Tuple: • If the query specifies a value v for attribute a, then the tuple t has v in its component for a. • If the query groups by an attribute a, then t has a non-* value in its component for a. • If the query neither groups by an attribute a nor specifies a value a, then t has a * value in its component for a. • Example: • Query: SELECT color, AVG(price) FROM Sales WHERE model = ‘Gobi’ GROUP BY color; • Tuple: (‘Gobi’, c, *, *, v, n)
Cube implementation by Materialized Views • Until now, we assumed that queries choose to aggregate completely in a dimension or not to aggregate at all. • Some dimensions can have many degrees of granularity. Example: • days can be aggregated by weeks, months, quarters or years • As number of choices of grouping increases, cost of storing the results increases too. • Answer to this is : chose some materialized views of the data cube.
Materialized Views • A materialized view is the result of some query, which we choose to store in the database, rather than reconstructing it as needed in response to queries. • Coarser the partition, the less space materialized view takes. • To use a view to answer a query, the view must not partition any more coarser than the query does. • Example: Group dates by months and dealers by city • INSERT INTO SalesV1 SELECT model, color, month, city, SUM(val) AS val, SUM(cnt) AS cnt FROM Sales JOIN Dealers on dealer = name GROUP BY model, color, month, city; • Example: Group dates by weeks and dealers by state and over all colors • INSERT INTO SalesV2 SELECT model, week, state, SUM(val) AS val, SUM(cnt) AS cnt FROM Sales JOIN Dealers on dealer = name GROUP BY model, week, state;
Queries using materialized view Q1 : SELECT model, SUM(val) FROM Sales GROUP BY model; can be answered by: SELECT model, SUM(val) FROM SalesV1 GROUP BY model; or SELECT model, SUM(val) FROM SalesV2 GROUP BY model; Q2 : SELECT model, year, state, SUM(val) FROM Sales JOIN Dealers ON dealer = name GROUP BY model, year, state; can only be answered by: SELECT model, year, state, SUM(val) FROM SalesV1 GROUP BY model, year, state; Q3 : SELECT model, color, date, SUM(val) FROM Sales GROUP BY model, color, date; can be answered by neither SalesV1 nor SalesV2.
The Lattice of Views • Lattice arranges groupings of similar dimensions in a hierarchical order. • Points of lattice are the ways we can partition a dimension. • If partition P1 is below partition P2 (P1≤P2), if each group of P1 is contained within some group of P2. • Examples of lattice: Lattice of partitions on time interval Lattice of partitions on dealers
Lattice of a materialized view • If V1 and V2 are two different views, then V1≤ V2 means in each dimension, the partition P1 from view V1 is at least as fine as partition P2 from view V2; that is P1≤P2. • General rule: We can answer a query Q using view V if and only if V ≤ Q. Example of view lattice: