600 likes | 699 Views
Computing the cube. Abhinandan Das CS 632 Mar 8 2001. On the Computation of Multidimensional Aggregates. Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey Naughton, Raghu Ramakrishnan & Sunita Sarawagi -- VLDB 1996.
E N D
Computing the cube Abhinandan Das CS 632 Mar 8 2001
On the Computation of Multidimensional Aggregates • Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey Naughton, Raghu Ramakrishnan & Sunita Sarawagi -- VLDB 1996
Motivation • OLAP / Multidimensional data analysis • Eg: Transactions(Prod,Date,StoreId,Cust,Sales) • Sum of sales by: (P,SId) ; (P) ; (P,D,SId) • Computing multidimensional aggregates is a performance bottleneck • Efficient computation of several related group-bys
What is a CUBE? • n-dimensional generalization of the group by operator • Group-bys corresponding to all possible subsets of a given set of attributes • Eg: SELECT P, D, C, Sum(Sales) FROM Transactions CUBE-BY P, D, C • ALL, P, D, C, PD, PC, DC, PDC
Approaches • Basic group-by methods: • Sort based • Hash based • Naïve approach
Possible optimizations • Smallest parent • Cache results • Amortize scans • Share sorts • Share partitions • Often contradictory • Assumption: Distributive aggregate function • sum, count, min, max ; average -- Non distributive: median
Sort based methods • Algorithm PipeSort • Share-sorts Vs Smallest parent • Optimize to get minimum total cost • Cache-results & amortize-scans • Pipelining: ABCD ABC AB A
PipeSort • Assumption: Have an estimate of the number of distinct values for each group-by • Input: Search lattice • Graph where each vertex represents a group-by of the cube • Edge i j if |i|=|j|+1,
Search lattice (contd...) • Each edge eij associated with two costs: • S(eij): Cost of computing j from i when i is pre-sorted • U(eij): Cost of computing j from i when i is not already sorted • Idea: If attribute order of a group-by j is a prefix of parent i, compute j without sorting (Cost S(eij)) else first sort (Cost U(eij))
PipeSort (contd...) • Proceed level by level, k=0 to k=N-1 • Find best way of computing level k from level k+1 • Weighted bipartite matching: • Make k additional replicas of each node at level k+1 • Cost S(eij) on original node, U(eij) on replicas • Find minimum cost maximal matching
Algorithm PipeSort • For level k = 0 to N-1 • Generate_plan(k+1 k) • Fix sort order of level k+1 nodes • Generate_plan(k+1 k): • Make k additional replicas of level k+1 nodes, assign appropriate edge costs • Find min-cost matching
Tweaks • Aggregate and remove duplicates whilst sorting • Use partial sorting order to reduce sort costs • Eg: ABC AC
Hash based methods • Algorithm PipeHash • Can include all stated optimizations: (If memory available) • For k=N downto 0 • For each group-by g at level k+1 • Compute in 1 scan of g all children for which g is smallest parent • Save g and deallocate hash table of g
PipeHash • Limited memory Use partitioning • Optimization share-partitions: • When data is partitioned on attribute A, all group-bys containing A can be computed independently on each partition • No recombination required
PipeHash: Overview • First choose smallest parent for each group-by (gives MST) • Optimize for memory constraints: • Decide what group-bys to compute together • Choice of attribute for data partitioning • Minimizing overall disk scan cost: NP-Hard!
Heuristics • Optimizations cache-results and amortize-scans favoured by choosing large subtrees of MST: Can compute multiple group-bys together • However, partitioning attribute limits subtree • Hence choose the partitioning attribute that allows choice of largest subtree
Algorithm: • Worklist w=MST • While w not empty • Choose any tree T from w • T’ = select_subtree(T) • Compute_subtree(T’)
Select_subtree(T) • If mem reqd by T < M, return T • Else: For any get subtree Ta • Let Pa=max # of partitions of root(T) possible if a used for partitioning • Choose a s.t. (mem reqd Ta)/Pa<M and Ta is largest subtree over all a • Add forest T-Ta to w, return Ta
Compute_subtree(T’) • numParts = (mem reqd T’)* fudge_factor/M • Partition root(T’) into numParts • For each partition of root(T’) • For each node n in T’ (breadth first) • Compute all children of n in 1 scan • Release memory occupied by hash table of n
Notes on PipeHash • PipeHash biased towards smallest-parent optimization • Eg: Compute BC from BCD (fig) • In practice, saving on sequential disk scans less important than reducing CPU cost of aggregation by choosing smallest parent!
Overlap method • Sort based • Minimize disk accesses by overlapping computation of “cuboids” • Focus: Exploit partially matching sort orders to reduce sorting costs • Uses smallest parent optimization
Sorted runs • B = (A1,A2,...Aj) ; S=(A1,...Al-1,Al+1,...Aj) • A sorted run of S in B is a maximal run of tuples in B whose ordering is consistent with the sort order in S • Eg:B=[(a,1,2),(a,1,3),(a,2,2),(b,1,3), (b,3,2),(c,3,1)] S=[(a,2),(a,3),(b,3),(b,2),(c,1)] (1st & 3rd) Sorted runs for S: [(a,2),(a,3)],[(a,2)],[(b,3)],[(b,2)] and [(c,1)]
Partitions • B, S have common prefix A1,...,Al-1 • A partition of a cuboid S in B is the union of sorted runs s.t. the first (l-1) columns (ie common prefix) have the same value • Previous eg: Partitions for S in B are: [(a,2),(a,3)], [(b,3),(b,2)] & [(c,1)] • Tuples from different partitions need not be merged for aggregation • Partition is independent unit of computation
Overview • Begin by sorting base cuboid • All other cuboids computed w/o re-sorting • Sort order of base cuboid determines sort order of all other cuboids • To maximize overlap across cuboid computations, reduce memory requirements of individual cuboids • Since partition is unit of computation, while computing one sorted cuboid from another, just need mem sufficient to hold a partition
Overview (contd...) • When partition is computed, tuples can be pipelined to descendants; same memory used by next partition • Significant saving: PartSize << CuboidSize • Eg: Computing ABC and ABD from ABCD PartSize(ABC) = 1 PartSize(ABD)=|D|
Choosing parent cuboids • Goal: Choose tree that minimizes size of partitions • Eg: Better to compute AC from ACD than ABC • Heuristic: Maximize size of common prefix
Choosing overlapping cuboids • To compute a cuboid in memory, need memory = PartSize • If required memory is available, cuboid is in Partition state • Else allocate 1 memory page for the cuboid, and mark as SortRun state • Only tuples of a Partition cuboid can be pipelined to descendants
Heuristics • Which cuboids to compute and in what state: Opt allocation NP-hard! • Heuristic: Traverse tree in BFS order • Intuition: • Cuboids to the left have smaller partition sizes • So require less memory
Cuboid computation • For each tuple t of B (parent) • If (state==partition) process_partition(t) • Else process_sorted_run(t) • Process_partition(t): • 3 cases: • Tuple starts new partition • Tuple matches existing tuple in partition • New tuple: Insert at appropriate place
Cuboid computation (contd...) • Process_sorted_run(t): • 3 cases • Tuple starts new sorted run • Tuple matches last tuple in current run • New tuple: Append tuple to end of current run • Cuboid in Partition state fully computed in 1 pass • Cuboid in SortRun state: Combine merge step with computation of descendant cuboids
Example CUBE computation (M=25 ; 9 sorted runs of BCD, CD to merge)
An array based algorithm forsimultaneous multidimensional aggregates • Yihong Zhao, Prasad Deshpande, Jeffrey Naughton -- SIGMOD ‘97
ROLAP Vs MOLAP • CUBE central to OLAP operations • ROLAP: Relational OLAP systems • PipeSort, PipeHash, Overlap • MOLAP: Multidimensional OLAP systems • Store data in sparse arrays instead of relational tables
MOLAP systems • Relational tuple: (jacket, K-mart, 1996, $40) • MOLAP representation: • Stores only ‘$40’ in a sparse array • Position in array encodes (jacket,K-mart,1996) • Arrays are “chunked” and compressed for efficient storage
Problem • No concept of “reordering” to bring together related tuples • Order cell visits to simultaneously compute several aggregates whilst minimizing memory requirements and # of cell visits
Efficient array storage: Issues • Array too large to fit in memory: Split into “chunks” that fit in memory • Even with chunking, array may be sparse: Compression needed • Standard PL technique of storing arrays in row/column major form inefficient • Creates asymmetry amongst dimensions, favoring one over the other
Chunking • Divide n-D array into small n-D chunks and store each chunk as independent object on disk • Keep size of chunk = disk block size • We shall use chunks having same size along each dimension
Compressing sparse arrays • Store “dense” chunks as is (>40% occ.) • Already a significant compression over a relational table • Sparse arrays: Use chunk-offset compression – (offsetInChunk,data) • Better than LZW etc. because: • Uses domain knowledge • LZW data requires decompression before use
Loading arrays from tables • Input: Table, dim sizes, chunksize • If M < array size, partition sets of chunks into groups which fit in memory eg: Suppose 16 chunks and 2 partitions, group chunks 0-7 & 8-16 • Scan table. For each tuple, calculate & store (chunk#,offset,data) into buffer page for corresponding partition • 2nd pass: For each partition, read tuples and assign to chunks in memory. Compress.
Basic algo (No overlapping) • Eg: 3-D array ABC; To compute AB If array fits in memory, sweep plane of size |A|*|B| along dim C, aggregating as you go • If array chunked: Sweep plane of size |Ac|*|Bc| through upper left chunks. Store result, move to chunks on right • Each chunk read in only once • Mem: 1 chunk + |Ac|*|Bc| plane
Generalization • Sweep k-1 dimensional subarrays through a k-dimensional array • Multiple group-bys: Use smallest parent optimization in cube lattice • Advantage over ROLAP: Since dimension & chunk sizes known, exact node sizes can be computed • Min Size Spanning Tree (MSST): Parent of node n is parent node n’ in lattice of min size
Basic array cubing algorithm: • First construct MSST of the group-bys • Compute a group-by Di1Di2...Dik from parent Di1...Di.k+1 of min size: • Read in each chunk of Di1...Di.k+1 along dimension Di.k+1 and aggregate each chunk to a chunk of Di1...Dik. Once a chunk of Di1...Dik is complete, flush and reuse mem
Example • ABC – 16x16x16 array • Chunk size: 4x4x4 • Dimension order: ABC • Eg: Computing BC: Read in order 1..64 After every 4, flush chunk to disk and reuse memory
Multi-Way algorithm • Single pass version: Assume enough memory to compute all group-bys in 1 scan • Reduce memory requirements using a special order to scan input array, called dimension order • A dimension order of the array chunks is a row major order of the chunks with n dimensions D1...Dn in some order O = (Di1,Di2,...Din)