190 likes | 356 Views
Johan Steensland Sumir Chandra Michael Thun é Manish Parashar IT, Dept. of Scientific Computing Dept. of Electrical & Computer Engg. Uppsala University Rutgers, The State University of NJ Uppsala, Sweden Piscataway, NJ, USA
E N D
Johan SteenslandSumir Chandra Michael ThunéManish Parashar IT, Dept. of Scientific Computing Dept. of Electrical & Computer Engg. Uppsala University Rutgers, The State University of NJ Uppsala, Sweden Piscataway, NJ, USA This research was supported by the National Science Foundation and Swedish Foundation for Strategic Research Characterization of Domain-Based Partitioners for Parallel SAMR Applications
Overview • Structured AMR • Partitioning Adaptive Grid Hierarchies • Grid Structures • Characterizing Partitioning Schemes • SAMR Applications • Partitioning Techniques • Experimental Evaluation • Partitioner Performance • Octant Approach • Partitioning Prescriptions • Towards ARMaDA • Conclusions Characterization of Domain-based Partitioners for Parallel SAMR Applications
Structured AMR • Adaptive Mesh Refinement • Start with a base coarse grid with minimum acceptable resolution • Tag regions in the domain requiring additional resolution, cluster the tagged cells, and fit finer grids over these clusters • Proceed recursively so that regions on the finer grid requiring more resolution are similarly tagged and even finer grids are overlaid on these regions • Resulting grid structure is a dynamic adaptive grid hierarchy The Berger-Oliger Algorithm Recursive Procedure Integrate(level) If (RegridTime) Regrid Step t on all grids at level “level” If (level + 1 exists) Integrate (level + 1) Update(level, level + 1) End if End Recursion level = 0 Integrate(level) Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Adaptive Grid Hierarchies • Balance load and… • Expose available parallelism • Minimize communication overheads • Inter-level prolongations/restrictions • Intra-level “ghost” communications • Enable dynamic load redistribution with minimum overheads • Parallel AMR costs • Communications • intra-level “ghost” communication • along the surface of each block • inter-level prolongation/restriction communications • gather/scatter between parents/children • Grid recomposition • grid refinement/coarsening • redistribution and load-balancing • prolongation • data-movement Characterization of Domain-based Partitioners for Parallel SAMR Applications
Time Step 0 Time Step 40 Time Step 80 Time Step 120 Time Step 160 Time Step 182 Level 1: Level 0: Level 3: Level 2: Level 4: Legend Grid Structures • 2-D Grid Structure Characterization of Domain-based Partitioners for Parallel SAMR Applications
Grid Structures (contd.) • 3-D Grid Hierarchy Characterization of Domain-based Partitioners for Parallel SAMR Applications
PAC Tuple Run-time selection of partitioning schemes based on system/ application parameters Evaluation Metrics Communication Requirement inter-level/intra-level communication & memory copies Load Imbalance amount of imbalance effort required Data Migration consider existing distribution Partitioning Time Partitioning Induced Overhead number of grid components quality of grid components size, aspect ratio Overview of Distribution Schemes Space-Filling Curves (SFC) Sequence Partitioning (SP) Multi-level Inverse SFC (Vampire) Geometric, binary dissection, parameterized binary dissection Binary Dissection (BD) Wavefront Diffusion (WD - ParMetis) Iterative Tree Balancing (ITB) Combined Grid Distribution (CGD) Independent Grid Distribution (IGD) Independent Level Distribution (ILD) Weighted Distribution Characterizing Partitioning Schemes Characterization of Domain-based Partitioners for Parallel SAMR Applications
SAMR Applications • Suite of 5 real-world SAMR application kernels • Scientific and engineering domains • Numerical relativity: scalarwave 2-D & 3-D • Oil reservoir simulations: Buckley-Leverette 2-D & 3-D • Computational fluid dynamics: • Compressible turbulence: rm 2-D • Supersonic flows: enoamr 2-D • Transport equation: Transport 2-D • Applications use 3 levels of factor 2 refinements • Refinements performed every 4 time-steps • Applications executed for 100 time-steps Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Techniques • SFC (ISP) • Recursive linear representation of multi-dimensional grid hierarchy generated using space-filling mappings (N-to-1 dimensional mapping) • Computational load determined by segment length and recursion level • G-MISP • Multi-level algorithm viewing matrix of workloads from SAMR grid hierarchy as a one-vertex graph, refined recursively • Favors speed at expense of load balance • G-MISP + SP • “Smarter” variant of G-MISP – uses sequence partitioning to assign consecutive portions of one-dimensional list to processors • Load balance improves but scheme is computationally more expensive Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Techniques (contd.) • pBD-ISP • Generalization of binary dissection – domain partitioned into p partitions • Each split divides load as evenly as possible, considering processors • SP • Domain sub-divided into p*b equally sized blocks • Dual-level algorithm enabling different parameter settings for each level • Fine granularity scheme – good load balance but increased overhead, communication and computational cost • WD • Part of ParMetis suite based on global workload and specializes in repartitioning graphs where refinements are scattered • Scheme results in fine grain partitionings with jagged boundaries and increased communication costs and overheads Characterization of Domain-based Partitioners for Parallel SAMR Applications
Experimental Evaluation • Normalized results for Scalarwave and Buckley-Leverette applications Characterization of Domain-based Partitioners for Parallel SAMR Applications
Experimental Evaluation (contd.) (+) - Significantly better (o) - Average (-) - Significantly worse Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioner Performance Performance summary of observed results • G-MISP • Fast, load balance not optimized, average overall performance • G-MISP+SP • Similar to G-MISP, better load balance, higher computational costs • pBD-ISP • Good overall performance, very fast, small communications and data movement, average load balance • SP • Computationally very expensive, unpredictable behavior, worse load balance than G-MISP+SP Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioner Performance (contd.) • ISP • Very fast, generates low overhead, below average load balance, higher communication, similar to those of G-MISP • WD • Metis integration extremely expensive, dedicated SAMR partitioners performed much better • Even though Metis is known to produce high-quality partitionings at a low cost, two extra steps were needed in our interface • Metis graph generated from grid before partitioning, clustering used to regenerate grid blocks from graph partitions after partitioning Characterization of Domain-based Partitioners for Parallel SAMR Applications
Octant Approach • Used to classify the state of the SAMR application with respect to • Adaptation pattern (scattered or localized) • Whether run-time is dominated by computation or communication • Activity dynamics in the solution Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Prescriptions • Association of partitioning techniques to application state octants Characterization of Domain-based Partitioners for Parallel SAMR Applications
Towards ARMaDA • ARMaDA – Adaptive Runtime Management of Dynamic Applications • “Best” partitioning depends on application/system configuration and current application/system state • Application Sensitive Adaptation • Partitioning Scheme: Vampire (MISP), GrACE (SFC), ParMetis (WD), RSB, ITB, etc. • Granularity: Patch size: AMR efficiency, comm./comp. ratio, overhead, node-performance, load-balance, etc. • Number of Processors/ Load on Processors: Dynamic allocations/ configuration/ management (1000+ processors from the beginning or “on-demand”, hierarchical decomposition using dynamic processor groups) • System Sensitive Adaptation • Availability of system resources • State of system resources: SNMP, NWS, REMOS • Heterogeneity Characterization of Domain-based Partitioners for Parallel SAMR Applications
Towards ARMaDA (contd.) • Adaptive meta-partitioner • Dynamic PAC tuple Characterization of Domain-based Partitioners for Parallel SAMR Applications
Conclusions • Application-centric characterization of domain-based partitioners • Partitioning quality determined by a 5-component metric • 6 partitioning schemes evaluated using 5 application kernels • Mapping of partitioners onto application state octants • Octant approach and dynamic PAC tuple • Overall goal • Support the formulation of policies required to drive a dynamically adaptive meta-partitioner for SAMR grid hierarchies • Selection of most appropriate partitioning strategy at run-time, based on current application and system state • Decrease in overall execution time • ARMaDA : Adaptive Run-time Management of Dynamic Applications Characterization of Domain-based Partitioners for Parallel SAMR Applications