150 likes | 161 Views
Open MPI exists to maximize MPI expertise in research/academia, industry, and more. The design features formalized interfaces, diverse implementations, and the flexibility to compose different systems on the fly.
E N D
Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences
Why does Open MPI exist? • Maximize all MPI expertise: • research/academia, • industry, • …elsewhere. • Capitalize on (literally) years of MPI research and implementation experience. • The sum is greater than the parts. Research/ academia Industry
Key design feature: Components Formalized interfaces • Specifies “black box” implementation • Different implementations available at run-time • Can compose different systems on the fly Caller Interface 1 Interface 2 Interface 3
Point-to-point architecture MPI PML-OB1/DR PML-CM BML-R2 MTL-MX (Myrinet) MTL- Portals MTL-PSM (QLogic) BTL-GM BTL-OpenIB MPool-GM MPool-OpenIB Rcache Rcache
Portals port: OB1 vs. CM CM • Matching maybe on NIC • Short message: eager, buffer on receive • Long message: eager • Send all data • If Match: deliver directly to user buffer • No Match: discard payload, and get() user data after match OB1 • Matching in main-memory • Short message: eager, buffer on receive • Long message: rendezvous • Rendezvous packet: 0 byte payload • Get message after match
Collective communications component structure User application MPI API Collective MPI Component Architecture (MCA) PML BTL Topology MTL Allocator I/O Basic Bucket OB1 CM DR CRCPW TCP Shared Mem. Infiband Myrinet MX Portals PSM Portals Basic Utility Basic Tuned Hierarchical Intercomm. Shared Mem. Non-blocking
2000 1800 1600 1400 1200 1000 800 600 400 200 0 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 NetPipe bandwidth data (MB/sec) Open MPI—CM Open MPI—OB1 Cray MPI Bandwidth (MBytes/sec) Data Size (KBytes)
VH1—Total runtime 250 Open MPI—CM Open MPI—OB1 Cray MPI 240 230 VH-1 Wall Clock Time (sec) 220 210 200 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 Log 2 Processor Count
GTC—Total runtime 1150 Open MPI—CM Open MPI—OB1 Cray MPI 1100 1050 1000 GTC Wall Clock Time (sec) 950 900 850 800 1 2 3 4 5 6 7 8 9 10 11 Log 2 Processor Count
POP—Step runtime Open MPI—CM Open MPI—OB1 Cray MPI 2048 1024 POP Time Step Wall Clock Time (sec) 512 256 128 3 4 5 6 7 8 9 10 11 Log 2 Processor Count
Summary and future directions • Support for XT (Catamount and Compute Node Linux) within standard distribution • Performance (application and micro-benchmarks) comparable to that of Cray MPI • Support for recovery from process failure is being added
Contact • Richard L. Graham • Tech Integration • National Center for Computational Sciences • (865) 356-3469 • rlgraham@ornl.gov www.open-mpi.org 15 Graham_OpenMPI_SC07