1 / 18

Adaptive MPI

Adaptive MPI. Chao Huang, Orion Lawlor, L. V. Kal é Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign. Motivation. Challenges New generation parallel applications are: Dynamically varying: load shifting, adaptive refinement

violet
Download Presentation

Adaptive MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign

  2. Motivation • Challenges • New generation parallel applications are: • Dynamically varying: load shifting, adaptive refinement • Typical MPI implementations are: • Not naturally suitable for dynamic applications • Set of available processors: • May not match those required by the algorithm • Adaptive MPI • Virtual MPI Processor (VP) • Solutions and optimizations Adaptive MPI

  3. Outline • Motivation • Implementations • Features and Experiments • Current Status • Future Work Adaptive MPI

  4. System implementation User View Processor Virtualization • Basic idea of processor virtualization • User specifies interaction between objects (VP’s) • RTS maps VP’s onto physical processors • Typically, # virtual processors > # processors Adaptive MPI

  5. MPI “processes” Implemented as virtual processes (user-level migratable threads) Real Processors AMPI: MPI with Virtualization • Each virtual process implemented as a user-level thread embedded in a Charm++ object MPI processes Adaptive MPI

  6. Adaptive Overlap p = 8 vp = 8 p = 8vp = 64 Problem setup: 3D stencil calculation of size 2403 run on Lemieux. Run with virtualization ratio 1 and 8. Adaptive MPI

  7. Benefit of Adaptive Overlap Problem setup: 3D stencil calculation of size 2403 run on Lemieux. Shows AMPI with virtualization ratio of 1 and 8. Adaptive MPI

  8. Comparison with Native MPI • Performance • Slightly worse w/o optimization • Being improved • Flexibility • Big runs on a few processors • Fits the nature of algorithms Problem setup: 3D stencil calculation of size 2403 run on Lemieux. AMPI runs on any # of PE’s (eg 19, 33, 105). Native MPI needs cube #. Adaptive MPI

  9. Automatic Load Balancing • Problems • Dynamically varying applications • Load imbalance impacts overall performance • Difficult to move jobs between processors • Implementation • Load balancing by migrating objects (VP’s) • RTS collects CPU and network usage of VP’s • New mapping based on collected information • Threads are packed up and shipped as needed • Different variations of strategies available Adaptive MPI

  10. Load Balancing Example AMR applicationRefinement happens at step 25Load balancer is activated at time steps 20, 40, 60, and 80. Adaptive MPI

  11. Collective Operations • Problem with collective operations • Complex: involving many processors • Time consuming: designed as blocking calls in MPI Time breakdown of 2D FFT benchmark [ms] Adaptive MPI

  12. Collective Communication Optimization Time breakdown of an all-to-all operation using Mesh library • Computation is only a small proportion of the elapsed time • A number of optimization techniques are developed to improve collective communication performance Adaptive MPI

  13. Asynchronous Collectives Time breakdown of 2D FFT benchmark [ms] • VP’s implemented as threads • Overlapping computation with waiting time of collective operations • Total completion time reduced Adaptive MPI

  14. Shrink/Expand • Problem: Availability of computing platform may change • Fitting applications on the platform by object migration Time per step for the million-row CG solver on a 16-node cluster Additional 16 nodes available at step 600 Adaptive MPI

  15. Summary of Advantages • Run MPI programs on any # of PE’s • Overlap computation with communication • Automatic load balancing • Asynchronous collective operations • Shrink/expand • Automatic checkpoint/restart mechanism • Cross communicators • Interoperability Adaptive MPI

  16. Gridgen Rocflu-MP Rocman Makeflo Rocflo-MP Tetmesh Rocface Rocsolid Metis Roccom Rocpanda Rocfrac Truegrid MPI/AMPI Rocburn2D Rocblas A P N ZN PY Application Example: GEN2 Adaptive MPI

  17. Future Work • Performance prediction via direct simulation • Performance tuning w/o continuous access to large machine • Support for visualization • Facilitating debugging and performance tuning • Support for MPI-2 standard • ROMIO as parallel I/O library • One-sided communications Adaptive MPI

  18. Thank You! Free download of AMPI is available at:http://charm.cs.uiuc.edu/ Parallel Programming Lab at University of Illinois Adaptive MPI

More Related