110 likes | 220 Views
Implementing SRC in MPI. Ishai Rabinovitz 19/7/07. SRC with two (and three) machines. Core 1. send QP. send QP. Core 1. SRQ. SRQ. SHM. SHM. rcv QP. rcv QP. SRC domain. SRC domain. rcv QP. rcv QP. Core 2. Core 2. send QP. send QP. SRQ. SRQ. Core 1. send QP. send QP.
E N D
Implementing SRC in MPI Ishai Rabinovitz 19/7/07
SRC with two (and three) machines Core 1 send QP send QP Core 1 SRQ SRQ SHM SHM rcv QP rcv QP SRC domain SRC domain rcv QP rcv QP Core 2 Core 2 send QP send QP SRQ SRQ
Core 1 send QP send QP Core 1 SRQ SRQ send QP send QP SHM SHM rcv QP rcv QP rcv QP rcv QP SRC domain SRC domain rcv QP rcv QP rcv QP rcv QP Core 2 Core 2 send QP send QP SRQ SRQ send QP send QP rcv QP rcv QP send QP send QP send QP send QP rcv QP rcv QP SRQ SRQ SRC domain SHM Core 1 Core 2
Notations: • (m,c) indicates core c in machine m. • SendQP((m1, c1), m2) is the QP that sends from core (m1, c1) to any core in machine m2. • RecvQP(m1, (m2, c2)) is the RecvQP in m1 that should get messages from core (m2, c2). • SRQ(m1, c1) is the SRQ in core (m1, c1).
Data structures • Each core (m,c) has: • Its own SRQ (SRQ(m, c)) • SendQPs table that maps machines to SendQP to this machine • Each entry is of the kind: m’->SendQP((m, c), m’) • Ranks table that maps ranks to their (m’, c’) and to the SRQ id (and SendQP) that should be used when sending messages to this rank. • Each entry is of the kind: r’->((m’, c’), SRQ(m’, c’), SendQP((m, c), m’)) • The sheared memory has • An RC (reference count) that count the number of cores working with the sheared entities. (We may use the SRC domain RC for this aim). • A RecvQP for each remote rank. • RecvQPs table that maps remote cores/ranks to the RecvQP number that should be used by this cores to send messages to the current machine. • Each entry is of the kind: r’->RecvQP(m, (m’, c’))
On initialization • Each core does the following protocol: • Lock(file) • Try to create the SRC domain • If success than you are the owner of the domain • If fail than the SRC domain already exists connect to it • In any case increase the RC • Unlock(file) • Create SRQ, connect it to the SRC domain and save its number • Fill the Rank table with the (m, c) for all ranks
Connection (main idea) • When core1 wants to connect to core2 on another machine it sends a connection request • This connection request has only information on how core2 can create connection for sending messages to core1 • After core2 establish a connection to core1, it send a reconnection request to core1 with data that will allow core1 to establish a connection on which it can send messages to core2 • This reconnection message can be sent on the first connection that was established
Our SRQ(m1, ca) id. (We will create the SRQ on intializtaion) • The SendQP number Creating a connection • Core (m1, c1) that wants to connect to another rank r2 (in core (m2,c2)) does the following: • Checks if there is an entry for r2 in the shared RecvQPs table (looking for RecvQP(m1, (m2, c2))) • If not: • Sets a lock on the shared RecvQP table (we can use lock(file)) • Rechecks that there is no entry for r2 in the RecvQPs table • Creates this RecvQp and saves its info in the RecvQPs table • Unlock • Send (using Eth. Or UD) a connection request with the following information to the other rank: • Its rank • Its details (m1, c1) • Its SRQ id • RecvQP(m1, (m2, c2)) number
Handling connection and reconnection request • When a rank (m1, c1) gets a connection request from (m2, c2) it does the following protocol: • Checks in the SendQPs table if there is already a SendQP to m2. • If there is such SendQP, increase its reference count • If there is no such SendQP: • Creates this SendQP • Connect it to the RecvQp it got in the connection message • Updates the SendQPs table with this SendQP • Updates the SRQs table with the SRQ it got in the connection request and with the SendQP it got in the last action. • If it is a connection request (and not a reconnection request) do the same protocol as in the last slide and sends a reconnection request back to (m2, c2). This reconnection request can be sent using the IB connection that was established. • If there is a waiting message to this rank in the waiting queue, send it.
Message from (m1,c1) to (m2,c2) • To send a message from (m1, c1) to (m2, c2) do the following: • Look in SRQs table for the rank. • If it does not exist in the table • Translates the rank to the (m2, c2) tuple using the ranks table you got in the beginning • Create a connection to (m2, c2) • Move the message to wait queue. • If it exist in the table • Take from the table sendQP((m1, c1), m2) and SRQ(m2, c2) • Send the message from sendQP((m1, c1), m2) using SRQ(m2, c2) id in the message.
Cleaning • Each rank that finish its execution should: • Close all sendQPs of this rank. • Free the SRQ of this rank • Put back the SRC domain • Decrease the RC • If the RC reaches 0 (Or the SRC domain is free) close all shared RecvQPs, free the Shared tables • Maybe we should release the a RecvQP after we get a disconnect in the CM and not after all cores have finished.