60 likes | 142 Views
MPICH2 on DCMF. Pavan Balaji , Argonne National Laboratory. About MPICH2. High-performance and widely portable implementation of the Message Passing Interface (MPI) Support MPI-1, MPI-2 and MPI-2.1 Used directly by several users
E N D
MPICH2 on DCMF PavanBalaji, Argonne National Laboratory
About MPICH2 • High-performance and widely portable implementation of the Message Passing Interface (MPI) • Support MPI-1, MPI-2 and MPI-2.1 • Used directly by several users • Used by vendors and collaborators to build MPI stacks for their systems (e.g., IBM, Intel, Microsoft, OSU, Myricom, SiCortex, Cray) • Available with many software distributions • Integrated with the ROMIO MPI-IO implementation and the MPE profiling library • Frequent release cycle • 1.0.8 released in October; 1.1a2 was released on Monday Pavan Balaji, Argonne National Laboratory (SC '08 DCMF BoF: 11/17/2008)
MPICH2 on DCMF: Current Status • Support for DCMF integrated starting the 1.1 release series • Provided by IBM (thank you!) • First released in August (1.1a1); continued in 1.1a2, … • Provides initial support for MPICH2 on BG/P • Tested for correctness, performance and scalability • Deployed on the 40-rack (163840-core) BG/P system at Argonne (ranked 3 in the June 2008 Top 500 ranking) Pavan Balaji, Argonne National Laboratory (SC '08 DCMF BoF: 11/17/2008)
Performance on BG/P “Non-Data-Communication Overheads in MPI: Analysis on BG/P”, P. Balaji, A. Chan, W. Gropp, R. Thakur and E. Lusk. In the Proceedings of the European PVM/MPI Users’ Group Meeting (EuroPVM/MPI 2008): Outstanding Paper Award Pavan Balaji, Argonne National Laboratory (SC '08 DCMF BoF: 11/17/2008)
MPICH2 on DCMF: Future Plans • Fine-grained Threads • Currently MPICH2 uses global locks for threads • Lock acquired on entry to an MPI call and released on exit • Leads to communication serialization • Solution: Acquiring locks only when required, different locks for different objects and lock-free operations can improve performance Pavan Balaji, Argonne National Laboratory (SC '08 DCMF BoF: 11/17/2008)
MPICH2 on DCMF: Future Plans • One-sided Communication (Get, Put, Lock, Unlock, …) • Utilize DCMF_Get and Put operations for true one-sided communication • Locking operations might still need to be two-sided, but probably DCMF can be extended to do these in a one-sided manner • Fault Tolerance • Error reporting is not trivial on network fabrics with many hops (e.g., torus) • E.g., a send call might succeed once data is handed over to the network (does not mean the destination has received it) Pavan Balaji, Argonne National Laboratory (SC '08 DCMF BoF: 11/17/2008)