1 / 15

Open MPI on the Cray XT

Open MPI exists to maximize MPI expertise in research/academia, industry, and more. The design features formalized interfaces, diverse implementations, and the flexibility to compose different systems on the fly.

runderwood
Download Presentation

Open MPI on the Cray XT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

  2. Why does Open MPI exist? • Maximize all MPI expertise: • research/academia, • industry, • …elsewhere. • Capitalize on (literally) years of MPI research and implementation experience. • The sum is greater than the parts. Research/ academia Industry

  3. Current membership

  4. Key design feature: Components Formalized interfaces • Specifies “black box” implementation • Different implementations available at run-time • Can compose different systems on the fly Caller Interface 1 Interface 2 Interface 3

  5. Point-to-point architecture MPI PML-OB1/DR PML-CM BML-R2 MTL-MX (Myrinet) MTL- Portals MTL-PSM (QLogic) BTL-GM BTL-OpenIB MPool-GM MPool-OpenIB Rcache Rcache

  6. Portals port: OB1 vs. CM CM • Matching maybe on NIC • Short message: eager, buffer on receive • Long message: eager • Send all data • If Match: deliver directly to user buffer • No Match: discard payload, and get() user data after match OB1 • Matching in main-memory • Short message: eager, buffer on receive • Long message: rendezvous • Rendezvous packet: 0 byte payload • Get message after match

  7. Collective communications component structure User application MPI API Collective MPI Component Architecture (MCA) PML BTL Topology MTL Allocator I/O Basic Bucket OB1 CM DR CRCPW TCP Shared Mem. Infiband Myrinet MX Portals PSM Portals Basic Utility Basic Tuned Hierarchical Intercomm. Shared Mem. Non-blocking

  8. Benchmark Results

  9. 2000 1800 1600 1400 1200 1000 800 600 400 200 0 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 NetPipe bandwidth data (MB/sec) Open MPI—CM Open MPI—OB1 Cray MPI Bandwidth (MBytes/sec) Data Size (KBytes)

  10. Zero byte ping-pong latency

  11. VH1—Total runtime 250 Open MPI—CM Open MPI—OB1 Cray MPI 240 230 VH-1 Wall Clock Time (sec) 220 210 200 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 Log 2 Processor Count

  12. GTC—Total runtime 1150 Open MPI—CM Open MPI—OB1 Cray MPI 1100 1050 1000 GTC Wall Clock Time (sec) 950 900 850 800 1 2 3 4 5 6 7 8 9 10 11 Log 2 Processor Count

  13. POP—Step runtime Open MPI—CM Open MPI—OB1 Cray MPI 2048 1024 POP Time Step Wall Clock Time (sec) 512 256 128 3 4 5 6 7 8 9 10 11 Log 2 Processor Count

  14. Summary and future directions • Support for XT (Catamount and Compute Node Linux) within standard distribution • Performance (application and micro-benchmarks) comparable to that of Cray MPI • Support for recovery from process failure is being added

  15. Contact • Richard L. Graham • Tech Integration • National Center for Computational Sciences • (865) 356-3469 • rlgraham@ornl.gov www.open-mpi.org 15 Graham_OpenMPI_SC07

More Related