260 likes | 417 Views
Optimizing Reduction Computations In a Distributed Environment. Tahsin Kurc , Feng Lee, Gagan Agrawal, Umit Catalyurek, Renato Ferreira, Joel Saltz Biomedical Informatics Department and Computer and Information Science Department Ohio State University. Roadmap. Data Intensive Computation
E N D
Optimizing Reduction Computations In a Distributed Environment Tahsin Kurc, Feng Lee, Gagan Agrawal, Umit Catalyurek, Renato Ferreira, Joel Saltz Biomedical Informatics Department and Computer and Information Science Department Ohio State University
Roadmap • Data Intensive Computation • Generalized Reduction Operations • Range Aggregation Queries • Runtime Environment: DataCutter • Execution Strategies • Replicated Filter State • Partitioned Filter State • Hybrid • Experimental Results • Conclusions and Future Work
Data Intensive, Distributed Computing • Large data collections • Multi-resolution, multi-dimensional • data elements correspond to points in multi-dimensional attribute space • medical images, satellite data, hydrodynamics data, oil reservoir simulation, seismic data, etc. • Data exploration and analysis • subsets of one or more datasets • Data subset is defined by a multi-dimensional window • A spatial index can be used to speed up data lookup (e.g., R-tree, quad-tree, etc.) • A data product is generated by processing the data subset; generally results in data reduction • Such queries are referred to as range aggregation queries • Data generated and processed in a distributed environment
accumulator per element operations defined by some spatial relationship • could be expensive • order independent Generalized Reduction Operations // Selection = range query DU = Output; DI = Select(Input, R); for (ue in DU) { get ue; ae = Initialize(ue); A = ae;} // Reduction for (ie in DI) { read ie; it = Transform(ie); SA = Map(it, A); for (ae in SA) { ae = Aggregate(it,ae); } } // Output for (ae in A) { ue = Finalize(ae); output ue; }
Many Applications Water Contamination Studies Satellite Data Processing Pathology Reservoir Simulation and Seismic Data Analysis Visualization
Generalized Reductions result data elements Dataset is partitioned into data chunks Chunks are distributed across disks intermediate data elements (accumulator elements) reduction function source data elements
Runtime Environment: DataCutter • Component Framework for Combined Task/Data Parallelism • User defines sequence of pipelined components (filters and filter groups) • User directive tells preprocessor/runtime system to generate and instantiate copies of filters • Stream based communication • Multiple filter groups can be active simultaneously • Flow control between transparent filter copies • Replicated individual filters • Transparent: single stream illusion
Aggregate Merge Output Read Map Transform Aggregate Merge Output Read Map Transform DataCutter-based Implementation • Implement each operation as a filter • Components can be merged • Order of some components can be changed
Roadmap • Data Intensive Computation • Generalized Reduction Operations • Range Aggregation Queries • Runtime Environment: DataCutter • Execution Strategies • Replicated Filter State • Partitioned Filter State • Hybrid • Experimental Results • Conclusions and Future Work
Execution StrategiesPartitioned Filter State • Partition the accumulator across computing nodes • Retrieve and send the input data elements (data chunks) to corresponding partitions • Parallel computation of accumulator pieces • Good use of aggregate memory space • Load imbalance and communication overhead
Partitioned Filter State • Partition in one dimension • Jagged 2D partition • Recursive Bisection • Graph/Hypergraph partitioning
Execution StrategiesReplicated Filter State • Replicate the accumulator on each computing node • Retrieve and do local aggregation • Retrieve and demand-driven assignment of data to nodes • Merge the partial results • Merge overheads • Not good use of distributed memory. Accumulator can be very large
Replicated Filter StateMerge Phase Partitioned Merge Hierarchical Merge
Execution StrategiesHybrid • A combination of the two extreme cases • Partition the accumulator • Replicate some of the sub-accumulator regions • More adaptable to load and environment
Several Ways to Hybrid • Hybrid is the most flexible among the strategies; many ways to implement it. • How to combine partitioning and replication • How to place partitioned and replicated pieces • Partition into N, replicate each piece by M • NxM = number of processors • Choice of N • N is small (approaching RFS) – e.g., if input is much larger than output • N is large (approaching PFS) – e.g., if input is comparable to output • Placement of replicated pieces • Assign pieces to nodes to minimize input communication (more suitable for machines with low communication-computation ratio) • Assign pieces to nodes to achieve load balance (more suitable for configurations with low computing power)
Several Ways to Hybrid • Partition nodes into groups • Each group has sufficient aggregate memory space for the accumulator • Replicate accumulator to each group • Partition within a group • Partition into N • Adaptively replicate the most loaded pieces • Assign them to least loaded processors
Experimental Results • Hybrid • Partition into N, replicate by M • N is small, if input is large • N is larger, if input is smaller than or comparable to output • N is typically 2 in our case • Place replicated pieces • Sort the storage nodes based on how much data they retrieve for each piece of N pieces • Place the pieces in the sorted order • Goals • The performance of the three strategies as the volume of input data and the size of accumulator are scaled proportionately. • The relative performance of the techniques in a distributed environment, where data is hosted on one set of machines, and another cluster is used for processing the data. • The scalability of the techniques in a cluster environment.
Experimental Results • Application Emulators • Satellite data processing • Skewed mapping of input to output • More compute intensive • Virtual Microscope • Data intensive • Regular mapping • Water Contamination Studies • More balanced • Regular mapping • Hardware Configuration • Pentium III cluster • 16 nodes • 300 GB disk space per node • 512 MB memory per node
Scalability Results: Titan Small Query/Large Acc. Large Query/Small Acc.
Scalability Results: VM Small Query/Large Acc. Large Query/Small Acc.
Distributed Execution: Titan Small Query/Large Acc. Large Query/Small Acc.
Distributed Execution: VM Small Query/Large Acc. Large Query/Small Acc.
Conclusions • Performance of strategies depends on application and platform characteristics • The runtime environment should support multiple strategies • Replicated strategy • Sufficient memory is available • Aggregation operation is expensive to offset the cost of merging • Hybrid is attractive • Best performance or close to the best one • it is more flexible • May not be possible to estimate the relative performance of different strategies • Automated selection of strategies and different hybrid approaches • Dynamic adaptation to the characteristics of the environment • Dynamic replication or partitioning of accumulator as the data is processed.