330 likes | 699 Views
Parallelization Strategies. Laxmikant Kale. Overview. OpenMP Strategies Need for adaptive strategies Object migration based dynamic load balancing Minimal modification strategies Thread based techniques: ROCFLO, .. Some future plans. OpenMP. Motivation:
E N D
Parallelization Strategies Laxmikant Kale
Overview • OpenMP Strategies • Need for adaptive strategies • Object migration based dynamic load balancing • Minimal modification strategies • Thread based techniques: ROCFLO, .. • Some future plans
OpenMP • Motivation: • Shared memory model often easy to program • Incremental optimization possible
ROCFLO via OpenMP • Parallelization of ROCFLO using a loop-parallel paradigm via OpenMP • Poor speedup compared with MPI version • Was locality the culprit? • Study conducted by Jay Hoeflinger • In collaboration with Fady Najjar
The Methodology • Do OpenMP/MPI comparison experiments. • Write an OpenMP version of ROCFLO • Start with the MPI version of ROCFLO, • Duplicate the structure of the MPI code exactly (including message passing calls). • This removes locality as a problem. • Measure performance • If any parts do not scale well, determine why.
So Locality was not the whole problem! • The other problems turned out to be: • I/O which doesn’t scale • ALLOCATE which doesn’t scale • our non-scaling reduction implementation • our first-cut messaging infrastructure which, could be improved • Conclusion • Efficient loop parallel version may be feasible, avoiding Allocates and using scalable IO
Need for adaptive strategies • Computation structure changes over time: • Combustion • Adaptive techniques in application codes: • Adaptive refinement in structures or even fluid • Other codes such as crack propagation • Can affect the load balance dramatically • One can go from 90% efficiency to less than 25%
Multi-partition decompositions • Idea: decompose the problem into a number of partitions, • independent of the number of processors • # Partitions > # Processors • The system maps partitions to processors • The system should be able to map and re-map objects as needed
Load Balancing Framework • Aimed at handling ... • Continuous (slow) load variation • Abrupt load variation (refinement) • Workstation clusters in multi-user mode • Measurement based • Exploits temporal persistence of computation and communication structures • Very accurate (compared with estimation) • instrumentation possible via Charm++/Converse
Charm++ • A parallel C++ library • supports data driven objects • many objects per processor, with method execution scheduled with availability of data • system supports automatic instrumentation and object migration • Works with other paradigms: MPI, openMP, ..
Load balancing demonstration • To test the abilities of the framework • A simple problem: Gauss-Jacobi iterations • Refine selected sub-domains • AppSpector: web based tool • Submit parallel jobs • Monitor performance and application behavior • Interact with running jobs via GUI interfaces
Adapitivity with minimal modification • Current code base is parallel (MPI) • But doesn’t support adaptivity directly • Rewrite the code with objects?... • Idea: support adaptivity with minimal changes to F90/MPI codes • Work by: • Milind Bhandarkar, Jay Hoeflinger, Eric de Sturler
Migratable threads approach • Change required: • Encapsulate global variables in modules • Dynamically allocatable • Intercept MPI calls • Implement them in a multithreaded layer • Run each original MPI process as a thread • User level thread • Migrate threads as needed by load balancing • Trickier problem than object migration
Progress: • Test Fortran-90 - C++ interface • Encapsulation feasibility: • Thread migration mechanics • ROCFLO study: • Test code implementation • ROCFLO implementation
Another approach to adaptivity • Cleanly separate parallel and sequential code: • All parallel code in C++ • All application code in Fortran 90 sequential subroutines • Needs more restructuring of application codes • But is feasible, especially for new codes • Much easier to migrate • Improves modularity