140 likes | 252 Views
Message Passing Interface (MPI). Jonathan Carroll-Nellenback CIRC Summer School. Review. Global Communicator MPI_COMM_WORLD Global Communication Routines: [All]Gather[v] Scatter [ v] [All]Reduce[v] Alltoall [v] BCast Barrier Reduction Operators
E N D
Message Passing Interface (MPI) Jonathan Carroll-Nellenback CIRC Summer School
Review • Global Communicator • MPI_COMM_WORLD • Global Communication Routines: • [All]Gather[v] • Scatter[v] • [All]Reduce[v] • Alltoall[v] • BCast • Barrier • Reduction Operators • MPI_[MAX,MIN,SUM,PROD], MPI[B,L][AND,OR,XOR] • Basic Data Types (put MPI_ in front of name of data type) • Fortran - MPI_[CHARACTER,INTEGER,REAL,LOGICAL,...] • C – MPI_[CHAR,SHORT,INT,LONG,...]
Types of MPI Arguments • Send Buffer – The starting address of the data to be sent • Send Count – The number of elements in the send buffer • Send Type – The type of elements in the send buffer • Recv Buffer – The starting address of the recv buffer • Recv Count – The number of elements to recv • Recv Type – The type of element to recv • Displacements – The offsets for Gatherv& Scatterv etc... • Tag – A message identifier • Root – The '1' in all-to-1 or 1-to-all communication • Dest – The destination for a point to point send • Source – The source for a point to point recv • Communicator – An independent collection of mpi tasks • Request – A handle to keep track of non-blocking sends or receives • Status – The status of a non-blocking send or any receive
Various way to parallelize Do Loops DO i=1,n a(i)=f(i) END DO a=0 DO i=rank+1,n,procs a(i)=f(i) END DO CALL MPI_Allreduce(MPI_IN_PLACE, a, n, MPi_REAL, MPI_SUM, MPI_COMM_WORLD, err) m=n/procs ALLOCATE(b(m)) DO i=1,m b(i)=f(m*rank+i) END DO CALL MPI_ALLGather(b, m, MPI_REAL, a, m, MPi_REAL, MPI_COMM_WORLD, err) DEALLOCATE(b)
Gather vs. GatherV m=n/procs ALLOCATE(b(m)) DO i=1,m b(i)=f(m*rank+i) END DO CALL MPI_ALLGather(b, m, MPI_REAL, a, m, MPi_REAL, MPI_COMM_WORLD, err) DEALLOCATE(b) m=n/procs rem=mod(n,procs) ALLOCATE(sizes(procs), displacements(procs+1)) sizes=(/(m+1,i=1,rem),(m,i=rem+1,procs)/) displacements=(/0,(sum(sizes(1:i)), i=1,procs)/) ALLOCATE(b(sizes(rank))) DO i=1,sizes(rank) b(i)=f(displacements(rank)+i) END DO CALL MPI_ALLGatherv(b, sizes(rank), MPI_REAL, a, sizes, displacements, & MPI_REAL, MPI_COMM_WORLD, err) DEALLOCATE(b,sizes,displacements)
Ahmdal's Law (The Hard Truth) Speed up S expected for a program run on n processors whereP is the fraction of the program that runs in parallel Glass half full Glass half empty
Measuring PerFormance • Fortran: • CALL MPI_WTIME(time) • C: • time=MPI_WTIME() • Measure the performance of exercise2p.f90 or exercise2p.c
Exercise 3 • Parallelize exercise3.f90 using an MPI_Reduce and measure the scaling with N=512, and N=1024 and 1, 4, and 16 procs.
Basic Sending and Receiving • /public/jcarrol5/mpi/example4.f90 • Tags – additional identifiers on messages • MPI_Send • MPI_Recv
Exercise 4 • Modify your program from exercise2 to only use point to point communication routines. (You can start with exercise2p.f90 or exercise2p.c)
Sending modes • Blocking vs Non-blocking • Non Blocking sends and receives will immediately return control to the calling routine. However, they usually will require buffering and testing later on to see whether the send/recv has completed. • Good for overlapping communication with computation • May lead to extra buffering • Synchronous vs Asynchronous • Synchronous sends require a matching recv to be called before returning. Blocking only if recv has not been posted. Does not require any additional buffering. • Buffered vsNonBuffered • Buffered sends explicitly buffer the data to be sent so that the calling routine can release the memory. • Ready send • Assumes that the receiver has already posted the recv.
Send Routines • /public/jcarrol5/mpi/example4.f90 • MPI_Send – May or may not block • MPI_Bsend – May buffer – returns immediately • MPI_Ssend – Synchronous Send (returns after matching recv posted) • MPi_Rsend – Ready send (matching recv must be posted) • MPI_Isend – Nonblocking send (must check for completion) • MPI_Ibsend – Nonblocking buffered send • MPI_Issend – Nonblocking synchronous send • MPI_Irsend - Nonblocking ready send • MPI_Recv – Blocking receive • MPI_IRecv – Nonblocking receive
Exercise 5 • Rewrite exercise 3 using ready sends(rsend), synchronous sends (ssend), and nonblocking sends (isend) and see if it is any faster.
Communicators and Groups • /public/jcarrol5/mpi/example5.f90 • MPI starts with one communicator (MPI_COMM_WORLD) • Separate communicator groups can be formed using • MPI_Comm_split • Or you can extract the group belonging to mpi_comm_world and create subgroups through various routines. • Multiple communicators can use the same group.