1 / 15

New Progress in Open MPI p2p communication: Elan and Sicortex

New Progress in Open MPI p2p communication: Elan and Sicortex. Teng Ma, George Bosilca @2008 ICL retreat. P2p communication in Open-MPI. MPI application. MPI level. PML(p2p management layer) OB1 or DR . BML(BTL management layer). MX BTL. Elan BTL. UDAPL BTL. SM BTL. OFUD BTL.

byron
Download Presentation

New Progress in Open MPI p2p communication: Elan and Sicortex

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Progress in Open MPI p2p communication: Elan and Sicortex Teng Ma, George Bosilca @2008 ICL retreat

  2. P2p communication in Open-MPI MPI application MPI level PML(p2p management layer) OB1 or DR BML(BTL management layer) MX BTL Elan BTL UDAPL BTL SM BTL OFUD BTL SCTP BTL GM BTL Openib BTL TCP BTL …… will come soon Sicortex BTL Xensocket BTL

  3. recalls for 1st elan btl version • Use elan Tport to implement btl’s send interface and elan RDMA to implement btl’s put and get interfaces. • Provide comparable bandwidth with vender’s quadrics MPI but still have some problem in latency

  4. Latency Problem

  5. Memory copy issue Open MPI elan btl Quadrics MPI Elan system buffer Elan system buffer User buffer User buffer Btl buffer Copy Copy Copy

  6. Elan queue send/recv • It doesn’t need pre-registered buffers to receive. The message is stored in elan system buffer (in elan queue). • Elan queue has better performance than elan tport for the message size<=2KB. 2KB is one slot size of elan queue.

  7. Queue and tport

  8. The latest elan btl

  9. elan btl’s status now… • Fix the bug of backward rank initialization and finalization bug.( no bug now) • Support multi-rail on single node. • Use elan’s queue, tport and RDMA to do Open-mpi send and put protocol. • the latency of small message improves a lot. • Provide Multi-thread support.

  10. Elan btl’s roadmap

  11. Sicortex machine

  12. p2p Performance provided by Sicortex

  13. Programming environment • MPI library (libscmpi.a) • Slurm • DMA library(libscdma.a)

  14. An example of do “get” by Sicortex DMA enigne • recvbuf = (char *) (((uintptr_t) &bigbuf[65536]) & (~65535ULL)); // 64KB alignment • ret = scdma_map_bds(ctx, 3, recvbuf, rs->bd_count); // map into dma buffer • void *cmd = (void *) scdma_cq_head_spinwait(ctx); //find a cmd header • uint64_t segmentComplete = 0; • scdma_build_s_bf_bf_cmdend_put (cmd, • peers[client>serverRank].route_handles[0], • peers[client->serverRank].ports[0], • client->returnRank, • rs->bd_base + i, 0, // source • 3 + i, 0, //destination • sysconf(_SC_PAGESIZE), // size of transfer • 0, • (uintptr_t) &segmentComplete); • __asm__ volatile("sync"); /* force those out to memory */ • scdma_cq_post(ctx); // issue the command to dma engine

  15. Future work • Improve the elan’s latency using tport to send. • Finish the development of Sicortex btl

More Related