200 likes | 297 Views
MPI. Introduction to MPI Commands. Basics – Send and Receive. MPI is a message passing environment. The processors’ method of sharing information is NOT via shared memory, but by processors sending messages to each other
E N D
MPI Introduction to MPI Commands
Basics – Send and Receive • MPI is a message passing environment. The processors’ method of sharing information is NOT via shared memory, but by processors sending messages to each other • This is done via a send-receive pairing. The originating processor can send anytime it wants to, but the destination processor has to do a receive before it gets to the destination
Send Function - Form • void MPI_Send(buf, count, datatype, dest, tag, MPI_COMM_WORLD) • buf – the name of the variable to be sent • Count – how many to send • Datatype – the type of what is being sent • Dest – where to send it • Tag – message type • COMM_WORLD – communicator – info about the parallel system
Send Arguments Discussion • buf – the address of the information to send – can be any data type. • datatype – must be a data type defined in MPI (ex. MPI_INT, MPI_FLOAT, MPI_DOUBLE, etc.). The user can create data types and “register” them with MPI (later). • Count – how many values of type datatype are to be sent starting from the address buf (not the byte size of buf)
Send Args Discussion (cont.) • Destination – which process to send the message to. Type – int • Tag – indicator about what kind of message is being sent. Programmer determined. Allows a process to send a variety of types of messages. Type - int • MPI_COMM_WORLD – communicator – information about the parallel system configuration to map destination (int) to a particular processor. There will be ways to change and/or create new communicators (later), for example to partition the system into groups of processors doing independent work.
More Discussion and Notes • It is more efficient to send a few big blocks of data than it is to send many small blocks of data (message sending overhead). • MPI uses an MPI defined data type so that communication between heterogeneous machines is possible. • Data being sent should be declared with an MPI defined type • MPI has MANY constants to indicate certain values (for example, MPI_INT may be 3). Get to know these constants.
Discussion and notes (cont.) • This send is a blocked send. The next instructions in the program will NOT be executed until the send is done (the data is sent to the system, does NOT wait until the data has been received).
Receive • MPI_Recv(buf, count, datatype, source, tag, status,MPI_COMM_WORLD) • Buf – where to put the message • Count – how many • Datatype – an mpi type for the count items in buf • Source – accept the message from this process (can be a wildcard for any process). • Tag- which type of message to accept (can be a wildcard for any type) • Status – optional, contains the source and tag for use if the tag and/or source args were wildcards.
Minimal MPI • Each MPI program needs the following 6: • MPI_Init(&argc, &argv) – initialize MPI – set up the MPI_COMM_WORLD communicator • intMPI_Comm_size(MPI_COMM_WORLD, &p) – Number of processes into p. • intMPI_Comm_rank(MPI_COMM_WORLD,&rank) – which process am I? • Send • Recv • MPI_Finalize() – Terminate MPI
MPI Philosopy • One program for all processes • Starts with init • Get my process number • Process 0 is usually the “Master” node (One process to bind them all – apologies to J.R.R. Tolkien.) • Big if/else statement to do master stuff verses slave stuff. • Master could also do some slave stuff • Load balancing issues
C MPI at WU on Herot • #include “mpi.h” • int main(int argc, char *argv[]) • MPI_Init(&argc, &argv) • Typically –np # to set up COMM_WORLD • mpicc - to compile mpi programs • mpirun –np # executable
Bcast • MPI_Bcast(buf, count, datatype, root, MPI_COMM_WORLD) • EVERY PROCESS executes this function. It is BOTH a send and receive. • Root is the “sender”, all other processes are receivers.
Reduce • MPI_Reduce(sendbuf, recvbuf, count, datatype, op, root, MPI_COMM_WORLD) • Executed by ALL processes (somewhat of a send and receive). • EVERYONE sends sendbuf where op is performed on all those items and the answer appears in recvbuf of process root. • Op is specified by one of many constants (ex. MPI_SUM, MPI_PROD, MPI_MAX, MPI_MIN)
Timing MPI Programs • double MPI_Wtime() • Time in seconds since some arbitrary point in time • Call twice, once at beginning, once at end of code to time • Difference is elapsed time • double MPI_Wtick() • Granularity, in seconds, of MPI Wtime function
Receive revisited • Recall • MPI_Recv(buf, count, datatype, source, tag, status,MPI_COMM_WORLD) • Source and/or tag could be a wildcard (MPI_ANY_TAG, MPI_ANY_SOURCE) • Status type MPI_Status • status.MPI_SOURCE • status.MPI_TAG • status.MPI_ERROR
Send/Receive Issues – Deadlock • One necessary condition for deadlock is mutual (cyclic) waiting • Process 0 does a send to p1 and then receive from p1 • Process 1 does a send to p0 and then receive from p0 • If there are no (or too small buffers) the p0 send will wait until the receive occurs on p1, but the p1 send has to wait for p0’s receive to do p1’s receive
More Deadlock • Doing • P0 sends to p1 then receives from p1 and p1 receiving from p0 then sending to p0 will not deadlock. • Ring solution • If we have a ring network and we want each processor to send its value to the “next” processor, you might have everyone do a send then a receive – could cause deadlock • Have even processors do send then receive, and odd processors do receive then send
Sendrecv • MPI_Sendrecv(sendBuf, sendCount, sendType, dest, sendTag, recBuf, recCount, recType,source, recTag, comm, status) • No need to worry about send/receive order. No deadlock • Good when every node gets someone else’s data (data shift) • If using same type, can use • MPI_Sendrecv_replace(buf,count,type,dest,sTag, source,rTag, com,status)
Non Blocking • MPI_Isend(buf, count, type, dest, tag, com, request) • MPI_Irecv(….same…) • int MPI_Test(request, flag, status) • Returns flag=1 if the operation associated with request is done, 0 if not • Status filled if flag=1 • MPI_Wait(request, status) • Blocks until operation with request is done