1 / 24

OFI Shared Memory

OFI Shared Memory. OFIWG. Overview. SHM Support Options SHM Primitives Region / cmd / resp / addr / map SHM Utilities Initialization / mapping SHM Provider Requirements / status Message protocols: inline / inject / iov Address exchange protocol. Overview. SHM Support Options

srebecca
Download Presentation

OFI Shared Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OFI Shared Memory OFIWG

  2. Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org

  3. Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org

  4. SHM Support Options • SHM support • SHM primitives provided in utility code without protocol • Provider adapts use of primitives for shm local communication using own protocol • SHM provider • Native provider using SHM primitives • Assumed all local communication Provider SHM Utilities SHM Primitives www.openfabrics.org

  5. Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org

  6. SHM Primitives: smr_region smr_region version flags pid lock map total_size cmd_queue_offset resp_queue_offset inject_pool_offset peer_addr_offet Synchronize shared memory access Stores peer addresses and pointers to peer smr_regions Control cmd_queue resp_queue Shared access inject_pool peer_addr www.openfabrics.org

  7. SHM Primitives: smr_cmd smr_cmd hdr { op { addr op op_src size data } msg_id } union data { msg iov rma_iov rma_ioc } msg / tagged / rma / atomic smr_src_inline/ smr_src_inject/ smr_src_iov Used for smr_src_inject (inject offset) and smr_src_iov (response offset) www.openfabrics.org

  8. SHM Primitives: Other smr_resp msg_id status Used by rx side to signal completion for large CMA (smr_src_iov) operations 4096 byte buffer for medium (smr_src_inject) message transfers smr_inject_buf data[SMR_INJECT_SIZE] smr_addr name addr Used to exchange remote endpoint information smr_map smr_peers { smr_addr smr_region } peers [SMR_MAX_PEERS] List of all peers and pointers to peer memory regions www.openfabrics.org

  9. Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org

  10. SHM Utilities: init • intsmr_create(conststructfi_provider *prov, • structsmr_map *map, • conststructsmr_attr *attr, • structsmr_region **smr) • Create smr_region and initializes all components • intsmr_map_create(conststructfi_provider *prov, • intpeer_count, • structsmr_map **map) • Create smr_map to store peer addresses and pointers to peer regions www.openfabrics.org

  11. SHM Utilities: mapping • intsmr_map_add(conststructfi_provider *prov, structsmr_map *map, • constchar *name, int id) • Add a peer by name into map with specific id • Note: adding to the map does not ensure peer region is mapped and accessible • intsmr_map_to_region(conststructfi_provider *prov, structsmr_peer *peer_buf) • Try to map to the peer region • Returns –FI_EAGAIN if peer’s region is not initialized yet • void smr_map_to_endpoint(structsmr_region *region, • intindex) • Exchange addressing information with a specific peer • Find this region in the peer to see if it has been mapped yet and update addresses www.openfabrics.org

  12. Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org

  13. SHM Provider • Endpoint address defaults toinfo->src_addr if provided or default endpoint name format if no src_addr given, requiring out-of-band endpoint name exchange • EP address = pid:domain_idx:endpoint_idx • Only messaging and tagged support • Currently only DGRAM • Inherently supports RDM • RMA and atomic implementation in progress • Aim to get full support working for MPI www.openfabrics.org

  14. Small Message Example (smr_src_inline) • Txside writes smr_cmd into peer’s cmd_queue • Only very small messages that can fit inline into smr_cmd (128 bytes) • Rx side decodes header and processes msg • Data is retrieved directly from cmd Tx CMD Rx Command Queue www.openfabrics.org

  15. Medium Message Example (smr_src_inject) • Tx side writes data into Rx side inject buffer • Txside writes msg header to Rx cmd • Header includes inject buffer offset • Rx side decodes header and processes msg • Data is retrieved from Rx inject buffer Tx CMD Rx Command Queue Inject Buffer www.openfabrics.org

  16. Large Message Example (smr_src_iov) • Txside writes msg header to Rx cmd • Header includes smr_resp offset (for ACK) • Rx side decodes header and processes msg from Tx process using CMA • Rx side writes ACK msg back to Tx side Tx CMD Rx Command Queue Tx RESP CMA Buffer www.openfabrics.org

  17. Completion Handling • For small to medium sized messages • Tx completes immediately after send • For large messages • delivery complete semantics • Tx does not complete until it has processed an ACK from the Rx side www.openfabrics.org

  18. Portability • SHM is disabled on non-linux platforms (no support for CMA) • SHM can be extended later to avoid using CMA • Add bounce buffering • Make SMR_INJECT_SIZE environment variable • Max message size = SMR_INJECT_SIZE www.openfabrics.org

  19. Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 www.openfabrics.org

  20. Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert 0 22222 UNSPEC 1 33333 UNSPEC 2 44444 UNSPEC • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 www.openfabrics.org

  21. Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert fi_av_insert 0 22222 UNSPEC 1 33333 UNSPEC 2 44444 0 • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 0 11111 2 1 22222 UNSPEC 2 33333 UNSPEC www.openfabrics.org

  22. Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert fi_av_insert 0 22222 UNSPEC fi_av_insert 1 33333 0 2 44444 0 • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 0 11111 1 0 11111 2 1 22222 UNSPEC 1 22222 UNSPEC 2 44444 2 2 33333 2 www.openfabrics.org

  23. Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert fi_av_insert 0 22222 0 0 11111 0 fi_av_insert 1 33333 0 1 33333 1 2 44444 0 2 44444 1 fi_av_insert • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 0 11111 1 0 11111 2 1 22222 1 1 22222 2 2 44444 2 2 33333 2 www.openfabrics.org

  24. Summary • Shared memory support available through primitives, utilities, and shmprovider • 3 types of messages: inline, inject, iov • Currently EP_DGRAM, but RMA and atomics in progress to support EP_RDM • Focusing on provider and integrating into MPI www.openfabrics.org

More Related