240 likes | 260 Views
OFI Shared Memory. OFIWG. Overview. SHM Support Options SHM Primitives Region / cmd / resp / addr / map SHM Utilities Initialization / mapping SHM Provider Requirements / status Message protocols: inline / inject / iov Address exchange protocol. Overview. SHM Support Options
E N D
OFI Shared Memory OFIWG
Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org
Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org
SHM Support Options • SHM support • SHM primitives provided in utility code without protocol • Provider adapts use of primitives for shm local communication using own protocol • SHM provider • Native provider using SHM primitives • Assumed all local communication Provider SHM Utilities SHM Primitives www.openfabrics.org
Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org
SHM Primitives: smr_region smr_region version flags pid lock map total_size cmd_queue_offset resp_queue_offset inject_pool_offset peer_addr_offet Synchronize shared memory access Stores peer addresses and pointers to peer smr_regions Control cmd_queue resp_queue Shared access inject_pool peer_addr www.openfabrics.org
SHM Primitives: smr_cmd smr_cmd hdr { op { addr op op_src size data } msg_id } union data { msg iov rma_iov rma_ioc } msg / tagged / rma / atomic smr_src_inline/ smr_src_inject/ smr_src_iov Used for smr_src_inject (inject offset) and smr_src_iov (response offset) www.openfabrics.org
SHM Primitives: Other smr_resp msg_id status Used by rx side to signal completion for large CMA (smr_src_iov) operations 4096 byte buffer for medium (smr_src_inject) message transfers smr_inject_buf data[SMR_INJECT_SIZE] smr_addr name addr Used to exchange remote endpoint information smr_map smr_peers { smr_addr smr_region } peers [SMR_MAX_PEERS] List of all peers and pointers to peer memory regions www.openfabrics.org
Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org
SHM Utilities: init • intsmr_create(conststructfi_provider *prov, • structsmr_map *map, • conststructsmr_attr *attr, • structsmr_region **smr) • Create smr_region and initializes all components • intsmr_map_create(conststructfi_provider *prov, • intpeer_count, • structsmr_map **map) • Create smr_map to store peer addresses and pointers to peer regions www.openfabrics.org
SHM Utilities: mapping • intsmr_map_add(conststructfi_provider *prov, structsmr_map *map, • constchar *name, int id) • Add a peer by name into map with specific id • Note: adding to the map does not ensure peer region is mapped and accessible • intsmr_map_to_region(conststructfi_provider *prov, structsmr_peer *peer_buf) • Try to map to the peer region • Returns –FI_EAGAIN if peer’s region is not initialized yet • void smr_map_to_endpoint(structsmr_region *region, • intindex) • Exchange addressing information with a specific peer • Find this region in the peer to see if it has been mapped yet and update addresses www.openfabrics.org
Overview • SHM Support Options • SHM Primitives • Region / cmd / resp / addr / map • SHM Utilities • Initialization / mapping • SHM Provider • Requirements / status • Message protocols: inline / inject / iov • Address exchange protocol www.openfabrics.org
SHM Provider • Endpoint address defaults toinfo->src_addr if provided or default endpoint name format if no src_addr given, requiring out-of-band endpoint name exchange • EP address = pid:domain_idx:endpoint_idx • Only messaging and tagged support • Currently only DGRAM • Inherently supports RDM • RMA and atomic implementation in progress • Aim to get full support working for MPI www.openfabrics.org
Small Message Example (smr_src_inline) • Txside writes smr_cmd into peer’s cmd_queue • Only very small messages that can fit inline into smr_cmd (128 bytes) • Rx side decodes header and processes msg • Data is retrieved directly from cmd Tx CMD Rx Command Queue www.openfabrics.org
Medium Message Example (smr_src_inject) • Tx side writes data into Rx side inject buffer • Txside writes msg header to Rx cmd • Header includes inject buffer offset • Rx side decodes header and processes msg • Data is retrieved from Rx inject buffer Tx CMD Rx Command Queue Inject Buffer www.openfabrics.org
Large Message Example (smr_src_iov) • Txside writes msg header to Rx cmd • Header includes smr_resp offset (for ACK) • Rx side decodes header and processes msg from Tx process using CMA • Rx side writes ACK msg back to Tx side Tx CMD Rx Command Queue Tx RESP CMA Buffer www.openfabrics.org
Completion Handling • For small to medium sized messages • Tx completes immediately after send • For large messages • delivery complete semantics • Tx does not complete until it has processed an ACK from the Rx side www.openfabrics.org
Portability • SHM is disabled on non-linux platforms (no support for CMA) • SHM can be extended later to avoid using CMA • Add bounce buffering • Make SMR_INJECT_SIZE environment variable • Max message size = SMR_INJECT_SIZE www.openfabrics.org
Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 www.openfabrics.org
Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert 0 22222 UNSPEC 1 33333 UNSPEC 2 44444 UNSPEC • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 www.openfabrics.org
Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert fi_av_insert 0 22222 UNSPEC 1 33333 UNSPEC 2 44444 0 • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 0 11111 2 1 22222 UNSPEC 2 33333 UNSPEC www.openfabrics.org
Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert fi_av_insert 0 22222 UNSPEC fi_av_insert 1 33333 0 2 44444 0 • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 0 11111 1 0 11111 2 1 22222 UNSPEC 1 22222 UNSPEC 2 44444 2 2 33333 2 www.openfabrics.org
Address / Name Exchange • . . . Endpoint name: 11111:0:0 • . . . Endpoint name: 22222:0:0 fi_av_insert fi_av_insert 0 22222 0 0 11111 0 fi_av_insert 1 33333 0 1 33333 1 2 44444 0 2 44444 1 fi_av_insert • . . . • . . . Endpoint name: 44444:0:0 Endpoint name: 33333:0:0 0 11111 1 0 11111 2 1 22222 1 1 22222 2 2 44444 2 2 33333 2 www.openfabrics.org
Summary • Shared memory support available through primitives, utilities, and shmprovider • 3 types of messages: inline, inject, iov • Currently EP_DGRAM, but RMA and atomics in progress to support EP_RDM • Focusing on provider and integrating into MPI www.openfabrics.org