110 likes | 255 Views
Parallel OLAP. Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University. Joint Work with F. Dehne T. Eavis S. Hambrusch. Decision Support Systems. A time-oriented analysis of scientific or organizational data. Data Minning. Online Analytical Processing (OLAP).
E N D
Parallel OLAP Andrew Rau-ChaplinFaculty of Computer ScienceDalhousie University Joint Work withF. DehneT. EavisS. Hambrusch
Decision Support Systems • A time-oriented analysis of scientific or organizational data Data Minning Online Analytical Processing (OLAP) Information Processing
Data Warehousing for Decision Support • Operational data collected into DW • DW used to support multi-dimensional views • Views form the basis of OLAP processing • Our focus: the OLAP server
A B C Data Cube Generation ABC • Proposed by Gray et al in 1995 • Can be generated from a relational DB but… AC BC AB 34 12 21 18 21 B A C 83 38 50 The cuboid ABC (or CAB) ALL
Core OLAP Operations • Five fundamental OLAP operations: roll-up, drill-down, slice, dice, and pivot • Range Queries
The Challenge • Design and build a parallel ROLAP system • Full cube generation • Partial cube generation • Indexing and query resolution • For • High dimensionality: 10 – 30 D • Large input data sizes: Gigabytes • Large output data sizes: Terabytes • Implications • Parallel + external memory • Shared disk + Shared nothing
Communication Fabric p1 p2 p3 p4 pn Communication Fabric p1 p2 p3 p4 pn The Architectural Model • Shared Disk • A set of P processors connected via an interconnection fabric • standard-sized local memory • concurrent access to a shared disk array • Shared Nothing • A set of p processors connected via and interconnection fabric • Standard size local memory • Independent local disk(s) • Algorithm Design • CGM (Coarse Grained Multicomputer)
Coarse Grained Multicomputer • A set of P processors • Arbitrary communication topology or shared memory • m memory per processor, m >>p • Communication round consists of an h-relation in which all proc. send and receive O(m) data Communication Fabric
Model Year Colour Sales Chevy 1990 Red 5 Chevy 1990 Blue 87 Ford 1990 Green 64 Ford 1990 Blue 99 Ford 1991 Red 8 Ford 1991 Blue 7 Model Year Colour Sales Chevy 1990 Blue 87 Chevy 1990 Red 5 Chevy 1990 ALL 92 Chevy ALL Blue 87 Chevy ALL Red 5 Chevy ALL ALL 92 Ford 1990 Blue 99 Ford 1990 Green 64 Ford 1990 ALL 163 Ford 1991 Blue 7 Ford 1991 Red 8 Ford 1991 ALL 15 Ford ALL Blue 106 Ford ALL Green 64 Ford ALL Red 8 ALL 1990 Blue 186 ALL 1990 Green 64 ALL 1991 Blue 7 ALL 1991 Red 8 Ford ALL ALL 178 ALL 1990 ALL 255 ALL 1991 ALL 15 ALL ALL Blue 193 ALL ALL Green 64 ALL ALL Red 13 ALL ALL ALL 270 MOLAP vs. ROLAP
Existing Parallel Results • Goil & Choudhary • MOLAP • Approach • Parallelize the generation of each cuboid • Challenge • > 2d comm. rounds
Parallelizing the Data Cube • Generating Data Cubes (Shared Disk) • Generating Data Cubes (Shared Nothing) • Generating Partial Data Cubes • Parallel Multi-dimensional Indexing • Conclusions and Future Work