990 likes | 1.01k Views
VLDB 2006. CURE for Cubes: C ubing U sing a R OLAP E ngine. Konstantinos Morfonios Yannis Ioannidis. University of Athens. Introduction. Execution Plan. External Partitioning. Storage Format. Experimental Evaluation. Conclusions. Introduction. Execution Plan. External Partitioning.
E N D
VLDB 2006 CURE for Cubes:Cubing Using a ROLAP Engine Konstantinos Morfonios Yannis Ioannidis University of Athens
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
GrayOnData-warehousing: CUBE Introduction SELECT region, sum(revenue) FROM SALES WHERE month = ‘September’ GROUP BY region
CUBE Introduction SELECT A, B, C, SUM(M) FROM R GROUP BY A, B, C SELECT A, B, SUM(M) FROM R GROUP BY A, B SELECT SUM(M) FROM R
Introduction • Problems • Construction algorithm • Storage scheme • Focusing on ROLAP techniques (MVs) • Stressed to limits? • Complete solution? Unclear (not finished with efficient storage) Unclear (not focused on hierarchies)
Number of nodes: often CURE Introduction Challenges of hierarchies: Efficient execution plan • Small domains in the higher levels of dimension hierarchies New partitioning algorithm • Number of tuples increases Novel storage scheme
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
Execution Plan • Extend BUC (Bottom-Up-Cube) [BR99] • Efficient pipelining • Cheap identification of some kinds of redundancy • Inherent support for iceberg cubes and holistic functions • Existing “BUC-based” methods: BU-BST [WLFY02] and QC-Tables [LPH02]
Execution Plan Dimensions:A, B, C ABC AB AC BC A B C
Execution Plan Dimensions:A0→A1→A2, B0→B1, C0
A0B0C0 A0B1C0 A1B0C0 A1B1C0 A2B0C0 A2B1C0 A0B0 A0B1 A0C0 A1B0 A1B1 A1C0 A2B0 A2B1 A2C0 B0C0 B1C0 A0 A1 A2 B0 B1 C0 Execution Plan Dimensions:A0, A1, A2, B0, B1, C0
A0B0C0 A0B1C0 A1B0C0 A1B1C0 A2B0C0 A2B1C0 A0B0 A0B1 A0C0 A1B0 A1B1 A1C0 A2B0 A2B1 A2C0 B0C0 B1C0 A0 A1 A2 B0 B1 C0 Execution Plan Dimensions:A0, A1, A2, B0, B1, C0
A0B0C0 A0B1C0 A1B0C0 A1B1C0 A2B0C0 A2B1C0 A0B0 A0B1 A0C0 A1B0 A1B1 A1C0 A2B0 A2B1 A2C0 B0C0 B1C0 A0 A1 A2 B0 B1 C0 Execution Plan Dimensions:A0, A1, A2, B0, B1, C0 Height: 3
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 Execution Plan Dimensions:A0→A1→A2, B0→B1, C0
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 Execution Plan Dimensions:A0→A1→A2, B0→B1, C0
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 Execution Plan Dimensions:A0→A1→A2, B0→B1, C0 Height: 6
ABC AB AC BC A B C Execution Plan • Important properties of BUC-based cubing: • Recursive calls at higher levels tend to be cheaper • Benefits from early pruning recursion at some node N increase with the number of ancestors of N in the execution plan • Advantage of taller execution plans ABC AB AC A
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 Execution Plan CURE’s Plan:
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
Memory External Partitioning R
A0B0C0 A0B0C0 A0B0 A0B0 A0B1C0 A0B1C0 A1B0C0 A1B0C0 A0B1 A0B1 A0C0 A0C0 A1B0 A1B0 A1B1C0 A1B1C0 A2B0C0 A2B0C0 A0 A0 A1B1 A1B1 A1C0 A1C0 A2B0 A2B0 A2B1C0 A2B1C0 B0C0 B0C0 A1 A1 A2B1 A2B1 A2C0 A2C0 B0 B0 B1C0 B1C0 A2 A2 B1 B1 C0 C0 External Partitioning Memory R
External Partitioning Memory R
Partitions External Partitioning Memory R
Partitions External Partitioning Sound Memory R
External Partitioning • For sound partitioning |Biggest partition|≤|M| • In flat datasets this holds in general • In hierarchical datasets…
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2(5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1(500)→A2 (5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5)
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5) |A0|/|A2| times smaller than R |A2B0C0| ≈ 50 MB
A0B0C0 A0B0 A0B1C0 A1B0C0 A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 External Partitioning |R| = 500 GB, |M| = 1 GB |R|/|M| = 500 A0 (50,000)→A1 (500)→A2 (5)
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
Storage Format • Two types of redundancy • Dimensional Redundancy (DR) • Aggregational Redundancy (AR)
A0B0C0 A0B0 A0B1C0 A1B0C0 ABC AB AC BC A0B1 A0C0 A1B0 A1B1C0 A2B0C0 A B C A0 A1B1 A1C0 A2B0 A2B1C0 B0C0 A1 A2B1 A2C0 B0 B1C0 A2 B1 C0 Storage Format Example with flat cube only for simplicity
t1 t2 t t’ Storage Format CUBE with DR CUBE’ without DR
t1 t2 t t’ Storage Format CUBE with DR CUBE’ without DR
t1 t2 t t’ Storage Format CUBE with DR CUBE’ without DR
Storage Format CUBE with DR CUBE’ without DR
Storage Format CUBE with DR CUBE’ without DR
Storage Format Classify tuples according to AR into: • Normal Tuples (NTs) • Trivial Tuples (TTs) • Common Aggregate • Tuples (CATs) CUBE with DR CUBE’ without DR