1 / 21

Cube Computation

Cube Computation. Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani. Cuboids Corresponding to the Cube. all. 0- D(apex) cuboid. country. product. date. 1- D cuboids. product,date. product,country. date, country. 2- D cuboids. 3- D(base) cuboid.

bruno
Download Presentation

Cube Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cube Computation Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani

  2. Cuboids Corresponding to the Cube all 0-D(apex) cuboid country product date 1-D cuboids product,date product,country date, country 2-D cuboids 3-D(base) cuboid product, date, country

  3. Efficient Data Cube Computation • Data cube can be viewed as a lattice of cuboids • The bottom-most cuboid is the base cuboid • The top-most cuboid (apex) contains only one cell • How many cuboids in an n-dimensional cube with L levels? • Materialization of data cube • Materialize every (cuboid) (full materialization), none (no materialization), or some (partial materialization) • Selection of which cuboids to materialize • Based on size, sharing, access frequency, etc.

  4. Cube Computation: ROLAP-Based Method • Efficient cube computation methods • ROLAP-based cubing algorithms (Agarwal et al’96) • Array-based cubing algorithm (Zhao et al’97) • Bottom-up computation method (Bayer & Ramarkrishnan’99) • ROLAP-based cubing algorithms • Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples • Grouping is performed on some subaggregates as a “partial grouping step” • Aggregates may be computed from previously computed aggregates, rather than from the base fact table

  5. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Partition arrays into chunks (a small subcube which fits in memory). • Compressed sparse array addressing: (chunk_id, offset) • Compute aggregates in “multiway” by visiting cube cells in the order which minimizes the # of times to visit each cell, and reduces memory access and storage cost. What is the best traversing order to do multi-way aggregation?

  6. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation Example:3-D data array containing 3 dimensions A, B, & C • Array is partitioned into small, memory-based chunks • 64 chunks • Dimension A is organized into 4 equi-sized partitions a0-a3 • Same for B & C • Chunks 1, 2,…,64 correspond to the subcubes a0b0c0, a1b0c0,…a3b3c3.

  7. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation Example:3-D data array containing 3 dimensions A, B, & C • Cardinality of the dimensions: A-40, B-400, & C-4000 • Size of each partition in A, B, & C is therefore 10, 100, 1000.

  8. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation Example:3-D data array containing 3 dimensions A, B, & C • Many possible orderings with which chunks can be read into memory • Suppose we want to computer b0c0 chunk of the BC cuboid • By scanning chunks 1-4 of ABC, the b0c0 chunk is computed • Cells for b0c0 are aggregated over a0 to a3.

  9. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Chunk memory can now be assigned to the next chunk b1c0, which completes its aggregation after scanning the next 4 chunks of ABC: 5-8 • Continuing in this manner, the entire BC cuboid can be computed • ONLY 1 chunk of BC needs to be in memory for the computation of all chunks of BC

  10. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • In computing entire BC cubiod, we will have examined all the 64 chunks • Is there a way to avoid having to rescan all of these chunks for the computation of other cuboids? • YES • MULTIWAY COMPUTATION OR SIMULTANEOUS AGGREGATION

  11. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • For example, when chunk 1 (a0b0c0) is being scanned (say for the computation of the 2D chunk b0c0 of BC), all of the other 2D chunks relating to a0b0c0 can be simultaneoulsy computed. • That is, when a0b0c0 is being scanned, each of the three chunks, b0c0, a0c0, & a0b0, on the three 2D aggregation planes, BC, AC, & AB, should be computed then as well

  12. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Multiway computation simultaneously aggregates to each of the 2D planes while a 3D chunk is in memory. • Largest 2D plane is BC (400*4000=1600000), then AC (40*4000=160000) & finally AB (40*400=16000). • Scan the chunks in the order shown below.

  13. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation B b0c0 chunk

  14. Multi-way Array Aggregation for Cube Computation C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 B 56 9 b2 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A

  15. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Suppose chunks are scanned in the order shown below • One chunk of the largest 2D plane BC is fully computed for each row scanned • b0co is fully aggregated after scanning 1-4 • Similarly b1co is fully aggregated after scanning 5-8 & so on • Complete computation of 1 chunk of 2nd largest plane AC, requires 13 chunks, given the ordering 1-64

  16. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Complete computation of 1 chunk of 2nd largest plane AC, requires 13 chunks, given the ordering 1-64 • a0c0 is fully aggregated after scanning of 1,5,9, & 13. • For smallest plane AB, the chunk a0b0 requires scanning 49 chunks. ( 1,17,33, & 49) • NOTE that AB requires the longest scan of chunks

  17. Multi-Way Array Aggregation for Cube Computation (Cont.) • Method: the planes should be sorted and computed according to their size in ascending order. • Idea: keep the smallest plane in the main memory, fetch and compute only one chunk at a time for the largest plane • Limitation of the method: computing well only for a small number of dimensions • If there are a large number of dimensions, “bottom-up computation” and iceberg cube computation methods can be explored

  18. Multi-Way Array Aggregation for Cube Computation (Cont.) BEST (156000 memory units) WORST (1641000 memory units)

  19. References • S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases, 506-521, Bombay, India, Sept. 1996. • K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), 359-370, Philadelphia, PA, June 1999. • V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, pages 205-216, Montreal, Canada, June 1996. • Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data, 159-170, Tucson, Arizona, May 1997.

  20. Q & A

  21. Thank You

More Related