Using Tiling to Scale Parallel Datacube Implementation

Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University

Part I Introduction to Data Cube Construction • Data cube construction involves computing aggregates for all values across all possible subsets of dimensions. • If the original dataset is n dimensional, the data cube construction includes computing and storing nCm m-dimensional arrays. Three-dimensional data cube construction involves computing arrays AB, AC, BC, A, B, C and a scalar value all.

Part I Motivation • Datasets for off-line processing are becoming larger. • A system storing and allowing analysis on such datasets is a data warehouse. • Frequent queries on data warehouses require aggregation along one or more dimensions. • Data cube construction performs all aggregations in advance to facilitate fast responses to all queries. • Data cube construction is a compute and data-intensive problem. • Memory requirements become the bottleneck for sequential algorithms. Construct data cubes in parallel in cluster environments!

Our Earlier Work • Parallel Algorithms for Small Dimensional Cases and Use of a Cluster Middleware (CCGRID 2002, FGCS 2003) • Parallel algorithms and theoretical results (ICPP 2003, HiPC 2003) • Evaluating parallel algorithms (IPDPS 2003)

Using Tiling • One important issue: memory requirements for intermediate results • From a Sparse m dimensional array, we compute m m-1 dimensional dense arrays • Tiling can help scale sequential and parallel datacube algorithms • Two important issues: • Algorithms for Using Tiling • How to tile so as to have minimum overhead

Outline • Main Issues and Data Structures • Parallel algorithms without tiling • Tiling for Sequential Datacube construction • Theoretical analysis • Tiling for Parallel Datacube construction • Experimental evaluation

Part I Main Issues • Cache and Memory Reuse • Each portion of the parent array is read only once to compute its children. Corresponding portions of each child should be updated simultaneously. • Using Minimal Parents • If a child has more than one parent, it uses the minimal parent which requires less computation to obtain the child. • Memory Management • Write back the output array to the disk if there is no child which is computed from this array. • Manage available main memory effectively • Communication Volume • Appropriately partition along one or more dimensions to guarantee minimal communication volume.

Prefix lattice Prefix tree Aggregation tree Part III Aggregation Tree Given a set X = {1, 2, …, n} and a prefix tree P(n), the corresponding aggregation tree A(n) is constructed by complementing every node in P(n) with respect to X.

Part III Theoretical Results • For data cube construction using aggregation tree • The total memory requirement for holding the results is bounded. • The total communication volume is bounded. • It is guranteed that all arrays are computed from their minimal parents. • A procedure of partitioning input datasets exists for minimizing interprocessor communication.

Part III Level One Parallel Algorithm Main ideas • Each processor computes a portion of each child at the first level. • Lead processors have the final results after interprocessor communication. • If the output is not used to compute other children, write it back; otherwise compute children on lead processors.

D1D2D3 D2D3 D1D3 D1D2 D3 D2 D1 all Three-dimensional array D1D2D3 with |D1|  |D2|  |D3| Part III Example • Assumption • 8 processors • Each of the three dimensions is partitioned in half • Initially • Each processor computes partial results for each of D1D2, D1D3 and D2D3

D1D2D3 D2D3 D1D3 D1D2 D3 D2 D1 all Three-dimensional array D1D2D3 with |D1|  |D2|  |D3| Part III Example (cont.) • Lead processors for D1D2 (l1, l2, 0)(l1, l2, 1) (0, 0, 0) (0, 0, 1) (0, 1, 0) (0, 1, 1) (1, 0, 0) (1, 0, 1) (1, 1, 0) (1, 1, 1) • Write back D1D2 on lead processors

D1D2D3 D2D3 D1D3 D1D2 D3 D2 D1 all Three-dimensional array D1D2D3 with |D1|  |D2|  |D3| Part III Example (cont.) • Lead processors for D1D3 (l1, 0, l3)(l1, 1, l3) (0, 0, 0) (0, 1, 0) (0, 0, 1) (0, 1, 1) (1, 0, 0) (1, 1, 0) (1, 0, 1) (1, 1, 1) • Compute D1 from D1D3 on lead processors; write back D1D3 on lead processors • Lead processors for D1 (l1, 0, 0)(l1, 0, 1) (0, 0, 0) (0, 0, 1) (1, 0, 0) (1, 0, 1) • Write back D1 on lead processors

Part IV Tiling-based Approach • Motivation • Parallel machines are not always available • Memory of individual computer is limited • Tiling-based Approaches • Sequential: Tile along dimensions on one processor • Parallel: Partition among processors and on each processor tile along dimensions

D1D2D3 D1D3 D1D2 D2D3 D1 D2 D3 all Three-dimensional array D1D2D3 with |D1|  |D2|  |D3| Part IV Sequential Tiling-based Algorithm • Main Idea • A portion of a node in aggregation tree is expandable (can be used to compute its children) once enough tiles of the portion of this node have been processed. • Main Mechanism • Each tile is given a label 4 tiles, tile along D2, D3. Each tile is given a lable (0, l2, l3) Tile 0 – (0, 0, 0) Tile 1 – (0, 0, 1) Tile 2 – (0, 1, 0) Tile 3 – (0, 1, 1)

D1D2D3 D1D2 D1D3 D2D3 D1 D2 D3 all Three-dimensional array D1D2D3 with |D1|  |D2|  |D3| Part IV Example

Tiling Overhead • Tiling based algorithm requires writing back and rereading portions of results • Want to tile to minimize the overhead • Tile the dimension Di 2ki times • We can compute the total tiling overhead as

Minimizing Tiling Overhead • Tile the largest dimension first, change its effective size • Keep choosing the largest dimension, till the memory requirements are below the available memory

D1D2D3D4 D1D2D3 D1D2D4 D1D3D4 D2D3D4 D1D2 D1D3 D2D3 D1D4 D2D4 D3D4 D1 D2 D3 D4 all Four-dimensional aggregation tree with |D1|  |D2|  |D3|  |D4| Part IV Parallel Tiling-based Algorithm • Assumptions • Three-dimensional partition (0 1 1 1) • Two-dimensional tiling (0 0 1 1) • Solutions • Apply tiling-based approaches to first level nodes only • Apply Level One Parallel Algorithm to other nodes

Part IV Choosing Tiling Parameters Tiling overhead exists. Tiling along multiple dimensions can reduce tiling overhead.

Part IV Parallel Tiling-based Algorithm Results Algorithm of choosing tiling parameters to reduce tiling overhead still takes effect in parallel environments!

More data goes here

Conclusions • Tiling can help scale parallel datacube construction • Algorithms and analytical results in our work

Using Tiling to Scale Parallel Datacube Implementation

Using Tiling to Scale Parallel Datacube Implementation

Presentation Transcript

Tiling Arrays

Using Map Scale

Tiling Deficient Boards Using L- Pentominoes

Parallel Implementation of BWT

Modeling Virus Capsids using Tiling Theory

Large Scale Parallel Print Service

Implementation of a Parallel K-Nearest Neighbor Algorithm Using MPI

Parallel Bench-Scale Digestion Studies

Using a Database And Datacube to Rapidly A ssess Fluxnet Data

TILING

Solving Awari using Large-Scale Parallel Retrograde Analysis

Introductions to Parallel Programming Using OpenMP

High Performance Parallel Implementation of Adaptive Beamforming Using Sinusoidal Dithers

Large Scale Parallel Supervised Topic-Modeling -implementation plan-

Tiling Service Sydney - Tiler - Evolve tiling

Introduction to Tiling Assembly

Introductions to Parallel Programming Using OpenMP

Using Scale

Large Scale Parallel Print Service

Wall tiling

Implementation of Computational Algorithms using Parallel Programming

Bathroom Tiling Melbourne - AK Tiling & Renovations