330 likes | 605 Views
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models. Message Passing Model Used on Distributed memory MIMD architectures Multiple processes execute in parallel asynchronously Process creation may be static or dynamic
E N D
Parallel Programming Models Message Passing Model • Used on Distributed memory MIMD architectures • Multiple processes execute in parallel asynchronously • Process creation may be static or dynamic • Processes communicate by using send and receive primitives
Parallel Programming Models Example: Pi calculation P = f01 f(x) dx = f014/(1+x2) dx = w∑f(xi) f(x) = 4/(1+x2) n = 10 w = 1/n xi = w(i-0.5) f(x) x 0 0.1 0.2 xi1
Parallel Programming Models Sequential Code #define f(x) 4.0/(1.0+x*x); main(){ int n,i; float w,x,sum,pi; printf(“n?\n”); scanf(“%d”, &n); w=1.0/n; sum=0.0; for (i=1; i<=n; i++){ x=w*(i-0.5); sum += f(x); } pi=w*sum; printf(“%f\n”, pi); } f(x) x 0 0.1 0.2 xi1 P = w ∑ f(xi) f(x) = 4/(1+x2) n = 10 w = 1/n xi = w(i-0.5)
Message-Passing Interface (MPI) http://www.mpi-forum.org
SPMD Parallel MPI Code #include <stdio.h> #include <mpi.h> #define f(x) 4.0/(1.0+x*x) main(int argc, char * argv[]){ int myid, nproc, root, err; int n, i, start, end; float w, x, sum, pi; err = MPI_Init(&argc, &argv); if (err != MPI_SUCCESS) { printf(stderr, “initialization error\n”); exit(1); } MPI_Comm_size(MPI_COMM_WORLD, &nproc); MPI_Comm_rank(MPI_COMM_WORLD, &myid); root=0; if (myid == root) { f1=fopen(“indata”, “r”); fscanf(f1, “%d”, &n); fclose(f1); } MPI_Bcast(&n, 1, MPI_INT, root, MPI_COMM_WORLD); w=1.0/n; sum=0.0; start = myid*(n/nproc); end = (myid+1)*(n/nproc); for (i=start; i<end; i++){ x = w*(i-0.5); sum += f(x); } MPI_Reduce(&sum, &pi, MPI_FLOAT, MPI_SUM, root, MPI_COMM_WORLD); if (myid == root) { f1=fopen(“outdata”, “w”); fprintf(f1, “pi=%f”, &pi); fclose(f1); } MPI_Finalize(); }
Message-Passing Interface (MPI) • MPI_INIT(int *argc, char ***argv): Initiate an MPI computation. • MPI_FINALIZE(): Terminate a computation. • MPI_COMM_SIZE (comm, size): Determine number of processes. • MPI_COMM_RANK(comm, pid): Determine my process identifier. • MPI_SEND(buf, count, datatype, dest, tag, comm): Send a message. • MPI_RECV(buf, count, datatype, source, tag, comm, status): Receive a message. • tag: message tag or MPI_ANY_TAG • source: process id of source process or MPI_ANY_SOURCE
Message-Passing Interface (MPI) Deadlock: • MPI_SEND and MPI_RECV are blocking. Consider the program where the two processes exchange data: ... if (rank .eq. 0) then call mpi_send( abuf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr ) call mpi_recv( buf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, &status, ierr ) else if (rank .eq. 1) then call mpi_send( abuf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr ) call mpi_recv( buf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, &status, ierr ) endif
Message-Passing Interface (MPI) Communicators • If two processes use different contexts for communication, there can be no danger of their communication being confused. • Each MPI communicator contains a separate communication context; this defines a separate virtual communication space. • Communicator Handle: identifies the process group and context with respect to which the operation is to be performed • MPI_COMM_WORLD: contains all the processes in a parallel computation
Message-Passing Interface (MPI) Collective Operations These operations are all executed in a collective fashion, meaning that each process in a process group calls the communication routine • Barrier:Synchronize all processes. • Broadcast:Send data from one process to all processes. • Gather: Gather data from all processes to one process. • Scatter: Scatter data from one process to all processes. • Reduction operations: addition, multiplication, etc. of distributed data.
Message-Passing Interface (MPI) Collective Operations • Barrier (comm):Synchronize all processes
Message-Passing Interface (MPI) Collective Operations • MPI_BCAST (inbuf, incnt, intype, root, comm): 1-to-all Ex:MPI_BCAST(A, 5, MPI_INT, 0, MPI_COMM_WORLD); A0 A1 A2 A3 A4 A0 A1 A2 A3 A4 P0 A0 A1 A2 A3 A4 P0 A0 A1 A2 A3 A4 P1 A0 A1 A2 A3 A4 P2 P3
Message-Passing Interface (MPI) Collective Operations • MPI_SCATTER (inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm): 1-to-all Ex:int A[100], B[25]; MPI_SCATTER(A, 25, MPI_INT, B, 25, MPI_INT, 0, MPI_COMM_WORLD); A A0 A1 A2 A3 B A0 P0 P0 P1 A1 P2 A2 P3 A3
Message-Passing Interface (MPI) Collective Operations • MPI_GATHER (inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm): all-to-1 Ex:int A[100], B[25]; MPI_GATHER(B, 25, MPI_INT, A, 25, MPI_INT, 0, MPI_COMM_WORLD); B B0 A B0 B1 B2 B3 P0 P0 B1 P1 B2 P2 B3 P3
Message-Passing Interface (MPI) Collective Operations • Reduction operations: Combine the values in the input buffer of each process using an operator Operations: • MPI_MAX, MPI_MIN • MPI_SUM, MPI_PROD • MPI_LAND, MPI_LOR, MPI_LXOR (logical) • MPI_BAND, MPI_BOR, MPI_BXOR (bitwise)
Message-Passing Interface (MPI) Collective Operations • MPI_REDUCE (inbuf, outbuf, count, type, op, root, comm) • Returns the combined value to the output buffer of a single root process Ex:int A[2], B[2]; MPI_REDUCE(A, B, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD); A 2 4 A 2 4 B 0 2 P0 5 7 0 3 6 2 P0 min 5 7 P1 B 0 2 0 3 P2 6 2 P3
Message-Passing Interface (MPI) Collective Operations • MPI_ALLREDUCE (inbuf, outbuf, count, type, op, comm) • Returns the combined value to the output buffers of all processes Ex:int A[2], B[2]; MPI_ALLREDUCE(A, B, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD); A 2 4 B 0 2 A 2 4 P0 P0 5 7 0 3 6 2 min 5 7 P1 P1 0 2 B 0 2 0 3 P2 P2 0 2 6 2 P3 P3 0 2
Message-Passing Interface (MPI) Asynchronous Communication • Data is distributed among processes which must then poll periodically for pending read and write requests • Local computation may interleave with the processing of incoming messages Non-blocking send/receive • MPI_ISEND (buf, count, datatype, dest, tag, comm): Send a message. • MPI_IRECV (buf, count, datatype, source, tag, comm, status): Receive a message. • MPI_WAIT (MPI_Request *request, MPI_Status *status): Complete a non-blocking operation
Message-Passing Interface (MPI) Asynchronous Communication • MPI_IPROBE (source, tag, comm, flag, status): Polls for a pending message without receiving it, and sets a flag. The message can then be received by using MPI_RECV. • MPI_PROBE (source, tag, comm, status): Blocks until the message is available. • MPI_GET_COUNT (status, datatype, count): Determines size of the message. • status (must be set by a previous probe): • status.MPI_SOURCE • status.MPI_TAG
Message-Passing Interface (MPI) Asynchronous Communication Ex: int count, *buf, source; MPI_PROBE (MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); source = status.MPI_SOURCE; MPI_GET_COUNT(status, MPI_INT, count); buf = malloc(count*sizeof(int)); MPI_RECV (buf, count, MPI_INT, source, 0, MPI_COMM_WORLD, &status);
Message-Passing Interface (MPI) Communicators • Communicator Handle: identifies the process group and context with respect to which the operation is to be performed • MPI_COMM_WORLD: contains all the processes in a parallel computation (default) • New communicators are formed by either including or excluding processes from an existing communicator. • MPI_COMM_SIZE() : Determine number of processes. • MPI_COMM_RANK() : Determine my process identifier.
Message-Passing Interface (MPI) Communicators • MPI_COMM_DUP (comm, newcomm): creates a new handle for the same process group • MPI_COMM_SPLIT (comm, color, key, newcomm): creates a new handle for a subset of a given process group • MPI_INTERCOMM_CREATE (comm, leader, peer, rleader, tag, inter): links processes in two groups • MPI_COMM_FREE (comm): destroys a handle
Message-Passing Interface (MPI) Communicators Ex: Two processes communicating with a new handle MPI_COMM newcomm; MPI_COMM_DUP (MPI_COMM_WORLD, newcomm); if (myid == 0) MPI_SEND (A, 100, MPI_INT, 1, 0, newcomm); else MPI_RECV (A, 100, MPI_INT, 0, 0, newcomm); MPI_COMM_FREE (newcomm);
Message-Passing Interface (MPI) Communicators Ex: Creating a new group with 4 members MPI_COMM comm, newcomm; int myid, color; ... MPI_COMM_RANK (comm, &myid); if (myid<4) color=1; else color=MPI_UNDEFINED; MPI_COMM_SPLIT (comm, color, myid, &newcomm); MPI_SCATTER (A, 10, MPI_INT, B, 10, MPI_INT, 0, newcomm); Processes:P0 P1 P2 P3 P4 P5 P6 P7 Ranks in comm: 0 1 2 3 4 5 6 7 Color: 1 1 1 1 ? ? ? ? Ranks in newcomm: 0 1 2 3
Message-Passing Interface (MPI) Communicators Ex: Splitting processes into 3 independent groups MPI_COMM comm, newcomm; int myid, color; ... MPI_COMM_RANK (comm, &myid); color = myid % 3; MPI_COMM_SPLIT (comm, color, myid, &newcomm); Processes:P0 P1 P2 P3 P4 P5 P6 P7 Ranks in comm: 0 1 2 3 4 5 6 7 Color: 01201201 Ranks in newcomm:0 1 20 1 20 1
Message-Passing Interface (MPI) Communicators MPI_INTERCOMM_CREATE (comm, local_leader, peer_comm, remote_leader, tag, intercomm): links processes in two groups • comm: intracommunicator (within group) • local_leader: leader within the group • peer_comm: parent communicator • remote_leader: other groups’ leader within the parent communicator
Message-Passing Interface (MPI) Communicators Ex: Communication of processes in two different groups MPI_COMM newcomm, intercomm; int myid, color; ... MPI_COMM_SIZE (MPI_COMM_WORLD, &count); if (count % 2 == 0){ MPI_COMM_RANK (MPI_COMM_WORLD, &myid); color = myid % 2; MPI_COMM_SPLIT (MPI_COMM_WORLD, color, myid, &newcomm); MPI_COMM_RANK (newcomm, &newid); if (newid % 2 == 0){ // group 0 MPI_INTERCOMM_CREATE(newcomm, 0, MPI_COMM_WORLD, 1, 99, intercomm); MPI_SEND (msg, 1, type, newid, 0, intercomm); } else { // group 1 MPI_INTERCOMM_CREATE(newcomm, 0, MPI_COMM_WORLD, 0, 99, intercomm); MPI_RECV (msg, 1, type, newid, 0, intercomm, &status); } } MPI_COMM_FREE (intercomm); MPI_COMM_FREE (newcomm); P0 P1 P2 P3 P4 P5 P6 P7 remote_leader local_leader destination local_leader remote_leader
Message-Passing Interface (MPI) Communicators Ex: Communication of processes in two different groups Processes:P0 P1 P2 P3 P4 P5 P6 P7 Rank in MPI_COMM_WORLD: 0 1 2 3 4 5 6 7 Processes:P0 P2 P4 P6 P1 P3 P5 P7 Rank in MPI_COMM_WORLD: 0 2 4 6 1 3 5 7 Rank in newcomm:0 1 2 30 1 2 3 newcomm newcomm local_leader local_leader remote_leader remote_leader
Message-Passing Interface (MPI) Derived Types Allow noncontiguous data elements to be grouped together in a message. Constructor functions: • MPI_TYPE_CONTIGUOUS (): constructs data type from contiguous elements • MPI_TYPE_VECTOR (): constructs data type from blocks separated by stride • MPI_TYPE_INDEXED (): constructs data type with variable indices and sizes • MPI_TYPE_COMMIT (): commit data type so that it can be used in communication • MPI_TYPE_FREE (): used to reclaim storage
Message-Passing Interface (MPI) Derived Types • MPI_TYPE_CONTIGUOUS (count, oldtype, newtype): constructs data type from contiguous elements Ex: MPI_TYPE_CONTIGUOUS (10, MPI_REAL, &newtype); • MPI_TYPE_VECTOR (count, blocklength, stride, oldtype, newtype): constructs data type from blocks separated by stride Ex: MPI_TYPE_VECTOR (5, 1, 4, MPI_FLOAT, &floattype); A Memory
Message-Passing Interface (MPI) Derived Types • MPI_TYPE_INDEXED (count, blocklengths, indices, oldtype, newtype): constructs data type with variable indices and sizes Ex: MPI_TYPE_INDEXED (3, BLenghts, Indices, MPI_INT, &newtype); Data 0 1 2 3 4 5 6 7 8 9 10 Blengths 2 3 1 Indices 1 5 10 Block 1 Block 2 Block 0
Message-Passing Interface (MPI) Derived Types • MPI_TYPE_COMMIT (type): commit data type so that it can be used in communication • MPI_TYPE_FREE (type): used to reclaim storage
Message-Passing Interface (MPI) Derived Types Ex: MPI_TYPE_INDEXED (3, BLenghts, Indices, MPI_INT, &newtype); MPI_TYPE_COMMIT (&newtype); MPI_SEND (A, 1, newtype, dest, 0, MPI_COMM_WORLD); MPI_TYPE_FREE (newtype); A 0 1 2 3 4 5 6 7 8 9 10 Blengths 2 3 1 Indices 1 5 10 Block 1 Block 2 Block 0