210 likes | 384 Views
An Evaluation of Partitioners for Parallel SAMR Applications. Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001 : European Conference on Parallel Computing. Introduction. AMR – Adaptive Mesh Refinement
E N D
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001 : European Conference on Parallel Computing
Introduction • AMR – Adaptive Mesh Refinement • AMR used for solving PDEs for dynamic applications • Challenges involved: • Dynamic resource allocation • Dynamic data distribution and load balancing • Communication and co-ordination • Partitioning of adaptive grid hierarchy • Evaluation of dynamic domain-based partitioning strategies with an application-centric approach
Motivation & Goal • Even for a single application, the most suitable partitioning technique depends on input parameters and its run-time state • Application-centric characterization of partitioners as a function of number of processors, problem size, and granularity • Enable the run-time selection of partitioners based on input parameters and application state
Partitioning Adaptive Grid Hierarchies • Adaptive Mesh Refinement • Start with a base coarse grid with minimum acceptable resolution • Tag regions in the domain requiring additional resolution, cluster the tagged cells, and fit finer grids over these clusters • Proceed recursively so that regions on the finer grid requiring more resolution are similarly tagged and even finer grids are overlaid on these regions • Resulting grid structure is a dynamic adaptive grid hierarchy The Berger-Oliger Algorithm Recursive Procedure Integrate(level) If (RegridTime) Regrid Step t on all grids at level “level” If (level + 1 exists) Integrate (level + 1) Update(level, level + 1) End if End Recursion level = 0 Integrate(level)
Time Step 0 Time Step 40 Time Step 80 Time Step 120 Time Step 160 Time Step 182 Level 1: Level 0: Level 3: Level 2: Level 4: Legend SAMR 2-D Grid Hierarchy
Partitioning Techniques • Static or Dynamic techniques • Geometric or Non-geometric • Dynamic partitioning – global or local approaches • Partitioners for SAMR grid applications • Patch-based • Domain-based • Hybrid
Partitioners Evaluated • SFC: Space Filling Curve based partitioning • G-MISP: Geometric Multi-level Inverse Space filling curve Partitioning • G-MISP+SP: Geometric Multi-level Inverse Space filling curve Partitioning with Sequence Partitioning • pBD-ISP: p-way Binary Dissection Inverse Space filling curve Partitioning • SP-ISP:“Pure” Sequence Partitioning with Inverse Space filling curve Partitioning • WD: Wavefront Diffusion based on global work load
SFC • Recursive linear representation of multi-dimensional grid hierarchy using space-filling mappings (N-to-1D mapping) • Computational load determined by segment length and recursion level
G-MISP & G-MISP+SP G-MISP • Multi-level algorithm views matrix of workloads from SAMR grid hierarchy as a one-vertex graph, refined recursively • Speed at expense of load balance G-MISP+SP • “Smarter” variant of G-MISP – uses sequence partitioning to assign consecutive portions of one-dimensional list to processors • Load balance improves but scheme is computationally more expensive
pBD-ISP • Generalization of binary dissection – domain partitioned into p partitions • Each split divides load as evenly as possible, considering processors
SP-ISP • Domain sub-divided into p*b equally sized blocks • Dual-level algorithm - parameter settings for each level • Fine granularity scheme: good load balance but increased overhead, communication and computational cost
WD • Part of ParMetis suite based on global workload • Used for repartitioning graphs with scattered refinements • Results in fine grain partitionings with jagged boundaries and increased communication costs and overheads • Metis integration extremely expensive, dedicated SAMR partitioners performed much better • Two extra steps needed for Metis in our interface • Metis graph generated from grid before partitioning, clustering used to regenerate grid blocks from graph partitions after partitioning
Experimental Setup • Application – RM3D • 3-D “real world” compressible turbulence application solving Richtmyer-Meshkov instability • Fingering instability which occurs at a material interface accelerated by a shock wave • Machine –NPACI IBM SP2 Blue Horizon at SDSC • Teraflop-scale Power3 based SMP cluster • 1152 processors and 512GB of main memory • AIX operating system • Peak bi-directional data transfer rate of approx. 115 MBps
Experimental Setup (contd.) • Base coarse grid – 128 * 32 * 32 • 3 levels of factor 2 space-time refinements • Application ran for 150 coarse level time-steps • Experiments consisted of varying – • Partitioner (from the set of evaluated partitioners) • Number of processors (16 – 128) • Granularity, i.e. the atomic unit (2*2*2 – 8*8*8) • Metrics used – total run-time, maximum load imbalance, AMR efficiency
Experimental Evaluation • RM3D needs rapid refinement and efficient redistribution • pBD-ISP, G-MISP+SP, SFC best suited for RM3D – fast partitioners with low imbalance and maintaining good communication patterns • pBD-ISP fastest, but average load imbalance • G-MISP+SP and SFC generate lowest imbalance but are relatively slower • Evaluated partitioning techniques scale reasonably well
Evaluation (contd.) • Coarse granularity produces high load imbalance • Fine granularity leads to greater synchronization and coordination overheads and higher execution times • Optimal partitioning granularity requires a trade-off between execution speed and load imbalance • For RM3D application, granularity of 4 gives lowest execution time with acceptable load imbalance
Conclusions • Experimental evaluation of dynamic domain-based partitioning and load-balancing techniques • RM3D compressible turbulence application • Effect of choice of partitioner and granularity on execution time • Formulation of application-centric characterization of the partitioners as a function of number of processors, problem size, and partitioning granularity