440 likes | 616 Views
Quotient Cube: How to Summarize the Semantics of a Data Cube. Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign) + * The work is partially supported by NSERC and NCE/IRIS
E N D
Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia)* Jian Pei (State Univ. of New York at Buffalo)* Jiawei Han (Univ. of Illinois at Urbana-Champaign)+ * The work is partially supported by NSERC and NCE/IRIS + The work is partially supported by NSF, UI, and Microsoft Research
Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Data Cube Base table Aggregation Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Previous Work: Efficient Cube Computation • Compute a cube from a base table: e.g. (Agarwal et al. 98), (Zhao et al. 97) • View materialization with space constraint: e.g. Harinarayann et al. 96 • Handling scarcity (Ross & Srivastava 97) • Cube compression: e.g. (Sismanis et al. 02), (Shanmugasundaram et al. 99), (Want et al. 02) • Approximation: e.g. (Barbara & Sullivan 97), (Barbara & Xu 00), (Vitter et al. 98) • Constrained cube construction: e.g. (Beyer & Ramakrishnan 99) Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Previous Work: Extracting Semantics From Cubes • General contexts of patterns (Sathe & Sarawagi 01) • Generalize association rules (Imielinski et al. 00) • Cube gradient analysis (Dong et al. 01) Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Cube (Cell) Lattice • Many cells have same aggregate values • Can we summarize the semantics of the cube by grouping cells by aggregate values? Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 C1 C2 C3 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 C4 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 A Naïve Attempt • Put all cells having same aggregate value in a class Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 C4 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Problems w/ the Naïve Attempt • The result is not a lattice anymore! • Anomaly • The rollup/drilldown semantics is lost Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 C4 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 C5 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 A Better Partitioning • Quotient cube: partitioning reserving the rollup/drilldown semantics Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Problem Statement • Given a cube, characterize a good way (quotient cube) of partitioning its cells into classes such that • The partition generates a reduced lattice preserving the rollup/drilldown semantics • The partition is optimal: # classes as small as possible • Compute quotient cubes efficiently Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Why A Quotient Cube Useful? • Semantic compression • Semantic OLAP browsing C3 C1 C2 C4 C5 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Why A Quotient Cube Useful? • Semantic compression • Semantic OLAP browsing C1 C2 C4 C5 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 C4 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 C5 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Convex Partitions • A convex partition retains semantics Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 C4 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 A Non-convex Partition • Anomaly • The rollup/drilldown semantics is lost Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Connected Partitions • Cells c1 and c2 are connected if a series of rollup/drilldown operation starting from c1 can touch c2 • Intuitively, (each class of) a partition should be connected Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Cover Partition • For a cell c, a tuple t in base table is in c’s cover if t can be rolled up to c • E.g., Cov(S1,*,spring)={(S1,P1,spring), (S1,P2,spring)} Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Cover Partitions Are Convex • All cells having the same cover are in a class • (S1,P2,s) and (*,P2,*) cover same tuples in the base table (S1,P2,*) and (*,P2,s) are in the same class. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Cover Partitions Are Connected • Cells c1 and c2 have the same cover there must be some common ancestor c3 of c1 and c2 st c3 has the same cover • Cells c1 and c2 are in the same class and connected Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Cover Partitions & Aggregates • All cells in a cover partition carry the same aggregate value w.r.t. any aggregate function • But cells in a class of MIN() may have different covers • For COUNT() and SUM() (positive), cover equivalence coincides with aggregate equivalence Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Weak Congruence • Weak congruence preserves semantics Class 1 = Class 2 Class 1 c c’ c c’ rollup rollup imply rollup rollup Class 2 d d’ d d’ Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Weak Congruence = Convex • Convex no “hole” in the class weak congruence • They preserve the rollup/drilldown semantics • Quotient cube lattice is the lattice of convex classes • How to derive the coarsest quotient cube? Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Monotone Aggregate Functions • Monotone functions • S T f(S) f(T) • S T f(S) f(T) • MIN(), MAX(), COUNT(), PSUM(), … • The aggregate function f is monotone f is the unique coarsest partition • MIN(): put all cells having the same MIN() value into a class Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Non-monotone Functions • Bad news: f may or may not be a convex/weak congruence. • Good news: cover partition is convex (I.e., weak congruence) and always yields a quotient cube w.r.t. any aggregate function! Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
How to Compute A QC • Aggregate functions • Monotone functions • Non-monotone functions • Settings • The cube is available • Only the base table is available Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Monotone Functions • The cube is available grab all cells with the same aggregate value and put them into a class • Only the base table is available bottom-up, depth-first search • For a cell, compute its cover, find the upper bound having the same aggregate value • Group lower bounds by upper bounds Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Example: Cover QC Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Non-monotone Functions • Class merging • Find cover partition classes • Merge classes as long as convexity is retained Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Example: AVG QC Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Reduction Ratio vs. Dimensionality # base tuples = 200k Zipf factor = 2.0 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Reduction Ratio vs. Zipf Factor # base tuples = 200k # dimensions = 6 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Reduction Ratio vs. Base Table Size Zipf factor = 2.0 # dimensions = 6 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Runtime Zipf factor = 2.0 # dimensions = 6 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Compression Ratio on Weather Data Set Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Semantic Cube Exploration • Theoretical foundation for semantic summarization in data cube • concept and properties of quotient cubes • Efficient algorithms for quotient cube construction • Quotient cubes can be computed directly from base tables Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Ongoing Research • Efficient implementation of quotient cube-based OLAP system • Data warehouse built using quotient cubes • Hierarchies and constraints • Incremental maintenance • Semantics based OLAP and mining • Efficient query answering Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
References (1) • R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 • S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB, 1996. • D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997. • D. Barbara and X. Wu. Using loglinear models to compress datacube. In WAIM'2000}, pages 311--322, 2000. • K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD'99. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Reference (2) • G. Birkhoff, Lattice Theory, 2nd edition, New York, American Mathematical Society (Colloquium Publications, vol. 25), 1948. • S. Geffner, D. Agrawal, A. El Abbadi, and T. R. Smith. Relative prefix sums: An efficient approach for querying dynamic OLAP data cubes. In ICDE'99. • Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE'96. • C.-T. Ho, J. Bruck, and R. Agrawal. Partial-sum queries in data cubes using covering codes. In PODS'97. • J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. In SIGMOD'01. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Reference (3) • V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD'96. • T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing Association Rules. Technical Report, Rutgers University, August 2000. • H. V. Jagadish, J. Madar, R.T. Ng. Semantic Compression and Pattern Extraction with Fascicles. VLDB'99. • K. Ross and D. Srivastava. Fast computation of sparse datacubes. In VLDB'97. • G. Sathe and S. Sarawagi. Intelligent Rollups in Multidimensional OLAP Data. VLDB'01. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube
Reference (4) • J. Shanmugasundaram, U.M. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. SIGKDD’99. • J. S. Vitter, M. Wang, and B. R. Iyer. Data cube approximation and historgrams via wavelets. In CIKM'98. • W. Wang, H. Lu, J. Feng, and J. X. Yu. Condensed cube: An effective approach to reducing data cube size. In ICDE'02. • Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97. • G.K. Zipf. Human Behavior and The Principle of Least Effort Addison-Wesley, 1949. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube