1 / 87

MPI Collective Communications Overview

Learn about MPI broadcast, reduce, scatter, and gather operations in parallel programming using MPI. Explore examples and detailed explanations.

groach
Download Presentation

MPI Collective Communications Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School of Mines) Message Passing Interface (MPI) 3

  2. MPI 3 Lecture Overview • Collective Communications • Advanced Topics • “V” Operations • Derived Data Types • Communicators

  3. Broadcast Operation: MPI_Bcast • All nodes call MPI_Bcast • One node (root) sends a message all others receive the message • C • MPI_Bcast(&buffer, count, datatype, root, COMM); • Fortran • call MPI_Bcast(buffer, count, datatype, root, COMM, ierr) • Root is node that sends the message

  4. Broadcast Example • Write a parallel program to broadcast data using MPI_Bcast • Initialize MPI • Have processor 0 broadcast an integer • Have all processors print the data • Quit MPI

  5. /************************************************************/************************************************************ This is a simple broadcast program in MPI ************************************************************/ int main(argc,argv) int argc; char *argv[]; { int i,myid, numprocs; int source,count; int buffer[4]; MPI_Status status; MPI_Request request; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid);

  6. source=0; count=8; if(myid == source){ for(i=0;i<count;i++) buffer[i]=i; } MPI_Bcast(buffer,count,MPI_INT,source,MPI_COMM_WORLD); for(i=0;i<count;i++) printf("%d ",buffer[i]); printf("\n"); MPI_Finalize(); }

  7. Output of broadcast program ds100 % more LL_out.478780 0:0 1 2 3 4 5 6 7 1:0 1 2 3 4 5 6 7 2:0 1 2 3 4 5 6 7 3:0 1 2 3 4 5 6 7 4:0 1 2 3 4 5 6 7 5:0 1 2 3 4 5 6 7 6:0 1 2 3 4 5 6 7 7:0 1 2 3 4 5 6 7

  8. Reduction Operations • Used to combine partial results from all processors • Result returned to root processor • Several types of operations available • Works on single elements and arrays

  9. MPI_Reduce • C • int MPI_Reduce(&sendbuf, &recvbuf, count, datatype, operation,root, communicator) • Fortran • call MPI_Reduce(sendbuf, recvbuf, count, datatype, operation,root, communicator, ierr) • Parameters • Like MPI_Bcast, a root is specified. • Operation is a type of mathematical operation

  10. Operations for MPI_Reduce MPI_MAX Maximum MPI_MIN Minimum MPI_PROD Product MPI_SUM Sum MPI_LAND Logical and MPI_LOR Logical or MPI_LXOR Logical exclusive or MPI_BAND Bitwise and MPI_BOR Bitwise or MPI_BXOR Bitwise exclusive or MPI_MAXLOC Maximum value and location MPI_MINLOC Minimum value and location

  11. Global Sum with MPI_Reduce C double sum_partial, sum_global; sum_partial = ...; ierr = MPI_Reduce(&sum_partial, &sum_global, 1, MPI_DOUBLE, MPI_SUM,root, MPI_COMM_WORLD); Fortran double precision sum_partial, sum_global sum_partial = ... call MPI_Reduce(sum_partial, sum_global, 1, MPI_DOUBLE_PRECISION, MPI_SUM,root, MPI_COMM_WORLD, ierr)

  12. Global Sum with MPI_Reduce2d array spread across processors

  13. p0 p0 p0 p0 p0 p1 p1 p1 p1 p1 p2 p2 p2 p2 p2 p3 p3 p3 p3 p3 Broadcast, Scatter and Gather A p0 A broadcast p1 A p2 A p3 A A B C D scatter A B C gather D A B C D A B C D all gather A B C D A B C D A B C D

  14. Scatter Operation using MPI_Scatter • Similar to Broadcast but sends a section of an array to each processors Data in an array on root node: A(0) A(1) A(2) . . ………. A(N-1) Goes to processors: P0 P1 P2 . . . Pn-1

  15. MPI_Scatter • C • int MPI_Scatter(&sendbuf, sendcnts, sendtype, &recvbuf, recvcnts, recvtype, root, comm ); • Fortran • MPI_Scatter(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,root,comm,ierror) • Parameters • sendbuf is an array of size (number processors*sendcnts) • sendcnts number of elements sent to each processor • recvcnts number of element(s) obtained from the root processor • recvbuf contains element(s) obtained from the root processor, may be an array

  16. Scatter Operation using MPI_Scatter • Scatter with Sendcnts = 2 Data in an array on root node: A(0) A(2) A(4) . . . A(2N-2) A(1) A(3) A(5) . . . A(2N-1) Goes to processors: P0 P1 P2 . . . Pn-1 B(0) B(0) B(0) B(0) B(1) B(1) B(1) B(1)

  17. Global Sum Example with MPI_Reduce • Example program to sum data from all processors

  18. #include <mpi.h> #include <stdio.h> #include <stdlib.h> /* ! This program shows how to use MPI_Scatter and MPI_Reduce ! Each processor gets different data from the root processor by way of MPI_Scatter. ! The data is summed and then sent back ! to the root processor using MPI_Reduce. The root processor then prints the global sum. */ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end globals */

  19. void init_it(int *argc, char ***argv); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); } int main(int argc,char *argv[]){ int *myray,*send_ray,*back_ray; int count; int size,mysize,i,k,j,total,gtotal; init_it(&argc,&argv); /* each processor will get count elements from the root */ count=4; myray=(int*)malloc(count*sizeof(int));

  20. /* create the data to be sent on the root */ if(myid == mpi_root){ size=count*numnodes; send_ray=(int*)malloc(size*sizeof(int)); for(i=0;i<size;i++) send_ray[i]=i; } /* send different data to each processor */ mpi_err = MPI_Scatter(send_ray, count, MPI_INT, myray, count, MPI_INT, mpi_root, MPI_COMM_WORLD); /* each processor does a local sum */ total=0; for(i=0;i<count;i++) total=total+myray[i]; printf("myid= %d total= %d\n ",myid,total);

  21. /* send the local sums back to the root */ mpi_err = MPI_Reduce(&total, &gtotal, 1, MPI_INT, MPI_SUM, mpi_root, MPI_COMM_WORLD); /* the root prints the global sum */ if(myid == mpi_root){ printf("results from all processors= %d \n ",gtotal); } mpi_err = MPI_Finalize(); }

  22. Gather Operation using MPI_Gather • Used to collect data from all processors to the root, inverse of scatter • Data is collected into an array on root processor Data from various Processors: P0 P1 P2 . . . Pn-1 A0 A1 A2 . . . An-1 Goes to an array on root node: A(0) A(1) A(2) . . . A(N-1)

  23. MPI_Gather • C • int MPI_Gather(&sendbuf,sendcnts, sendtype, &recvbuf, recvcnts,recvtype,root, comm ); • Fortran • MPI_Gather(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,root,comm,ierror) • Parameters • sendcnts number of elements sent from each processor • sendbuf is an array of size sendcnts • recvcnts number of elements obtained from each processor • recvbuf of size recvcnts*number of processors

  24. Code for Scatter and Gather • A parallel program to scatter data using MPI_Scatter • Each processor sums the data • Use MPI_Gather to get the data back to the root processor • Root processor prints the global data • See attached Fortran and C code

  25. module mpi include "mpif.h“end module! This program shows how to use MPI_Scatter and MPI_Gather! Each processor gets different data from the root processor! by way of mpi_scatter. The data is summed and then sent back! to the root processor using MPI_Gather. The root processor! then prints the global sum. module global integer numnodes,myid,mpi_err integer, parameter :: mpi_root=0end modulesubroutine init use mpi use global implicit none! do the mpi init stuff call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err)

  26. end subroutine initprogram test1 use mpi use global implicit none integer, allocatable :: myray(:),send_ray(:),back_ray(:) integer count integer size,mysize,i,k,j,total call init! each processor will get count elements from the rootcount=4 allocate(myray(count))! create the data to be sent on the root if(myid == mpi_root)thensize=count*numnodes allocate(send_ray(0:size-1)) allocate(back_ray(0:numnodes-1)) do i=0,size-1 send_ray(i)= i enddo endif

  27. call MPI_Scatter (send_ray, count, MPI_INTEGER, &myray, count, MPI_INTEGER, & mpi_root, MPI_COMM_WORLD,mpi_err)! each processor does a local sumtotal=sum(myray)write(*,*)"myid= ",myid," total= ",total! send the local sums back to the rootcall MPI_Gather ( total, 1, MPI_INTEGER, &back_ray, 1, MPI_INTEGER, & mpi_root, MPI_COMM_WORLD,mpi_err)! the root prints the global sum if(myid == mpi_root)then write(*,*)"results from all processors= ",sum(back_ray) endif call mpi_finalize(mpi_err) end program

  28. #include <mpi.h>#include <stdio.h>#include <stdlib.h>/*! This program shows how to use MPI_Scatter and MPI_Gather! Each processor gets different data from the root processor! by way of mpi_scatter. The data is summed and then sent back! to the root processor using MPI_Gather. The root processor! then prints the global sum. *//* globals */int numnodes,myid,mpi_err;#define mpi_root 0/* end globals */void init_it(int *argc, char ***argv);void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }

  29. int main(int argc,char *argv[]){ int *myray,*send_ray,*back_ray; int count; int size,mysize,i,k,j,total; init_it(&argc,&argv);/* each processor will get count elements from the root */ count=4; myray=(int*)malloc(count*sizeof(int));/* create the data to be sent on the root */ if(myid == mpi_root){ size=count*numnodes; send_ray=(int*)malloc(size*sizeof(int)); back_ray=(int*)malloc(numnodes*sizeof(int)); for(i=0;i<size;i++) send_ray[i]=i; } /* send different data to each processor */

  30. mpi_err = MPI_Scatter( send_ray, count, MPI_INT, myray, count, MPI_INT, mpi_root, MPI_COMM_WORLD);/* each processor does a local sum */ total=0; for(i=0;i<count;i++) total=total+myray[i]; printf("myid= %d total= %d\n ",myid,total);/* send the local sums back to the root */ mpi_err = MPI_Gather(&total, 1, MPI_INT, back_ray, 1, MPI_INT, mpi_root, MPI_COMM_WORLD);/* the root prints the global sum */ if(myid == mpi_root){ total=0; for(i=0;i<numnodes;i++) total=total+back_ray[i]; printf("results from all processors= %d \n ",total); } mpi_err = MPI_Finalize();}

  31. Output of previous code on 4 procs myid= 1 total= 22 myid= 2 total= 38 myid= 3 total= 54 myid= 0 total= 6 results from all processors= 120 ( 0 through 15 added up = (15) (15 + 1) /2 = 120)

  32. MPI_Allgather and MPI_Allreduce • Gather and Reduce come in an "ALL" variation • Results are returned to all processors • The root parameter is missing from the call • Similar to a gather or reduce followed by a broadcast

  33. All to All communication with MPI_Alltoall • Each processor sends and receives data to/from all others • C • int MPI_Alltoall(&sendbuf,sendcnts, sendtype, &recvbuf, recvcnts, recvtype, MPI_Comm); • Fortran • call MPI_Alltoall(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,comm,ierror)

  34. a0 a1 a2 a3 a0 b0 c0 d0 P0 P1 P2 P3 P0 P1 P2 P3 b0 b1 b2 b3 a1 b1 c1 d1 c0 c1 c2 c3 a2 b2 c2 d2 d0 d1 d2 d3 a3 b3 c3 d3 MPI_Alltoall

  35. All to All with MPI_Alltoall • Parameters • sendcnts number of elements sent to each processor • sendbuf is an array of size sendcnts • recvcnts number of elements obtained from each processor • recvbuf is an array of size recvcnts • Note that both send buffer and receive buffer must be an array of size of the number of processors • See attached Fortran and C codes

  36. module mpi include "mpif.h“ end module! This program shows how to use MPI_Alltoall. Each processor! send/rec a different random number to/from other processors. module global integer numnodes,myid,mpi_err integer, parameter :: mpi_root=0end modulesubroutine init use mpi use global implicit none! do the mpi init stuff call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err)end subroutine init

  37. program test1 use mpi use global implicit none integer, allocatable :: scounts(:),rcounts(:) integer ssize,rsize,i,k,j real z call init ! counts and displacement arraysallocate(scounts(0:numnodes-1)) allocate(rcounts(0:numnodes-1)) call seed_random! find data to send do i=0,numnodes-1 call random_number(z) scounts(i)=nint(10.0*z)+1 Enddo write(*,*)"myid= ",myid," scounts= ",scounts

  38. ! send the datacall MPI_alltoall ( scounts,1,MPI_INTEGER, & rcounts,1,MPI_INTEGER, MPI_COMM_WORLD,mpi_err) write(*,*)"myid= ",myid," rcounts= ",rcounts call mpi_finalize(mpi_err)end program subroutine seed_random use global implicit none integer the_size,j integer, allocatable :: seed(:) real z call random_seed(size=the_size) ! how big is the intrisic seed? allocate(seed(the_size)) ! allocate space for seed do j=1,the_size ! create the seed seed(j)=abs(myid*10)+(j*myid*myid)+100 ! abs is generic enddo call random_seed(put=seed) ! assign the seed deallocate(seed)end subroutine

  39. #include <mpi.h>#include <stdio.h>|#include <stdlib.h>/*! This program shows how to use MPI_Alltoall. Each processor! send/rec a different random number to/from other processors. *//* globals */int numnodes,myid,mpi_err;#define mpi_root 0/* end module */void init_it(int *argc, char ***argv);void seed_random(int id);void random_number(float *z);void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid);}

  40. int main(int argc,char *argv[]){ int *sray,*rray; int *scounts,*rcounts; int ssize,rsize,i,k,j; float z; init_it(&argc,&argv); scounts=(int*)malloc(sizeof(int)*numnodes); rcounts=(int*)malloc(sizeof(int)*numnodes); /*! seed the random number generator with a! different number on each processor*/ seed_random(myid);/* find data to send */ for(i=0;i<numnodes;i++){ random_number(&z); scounts[i]=(int)(10.0*z)+1; } printf("myid= %d scounts=",myid); for(i=0;i<numnodes;i++) printf("%d ",scounts[i]); printf("\n");

  41. /* send the data */ mpi_err = MPI_Alltoall( scounts,1,MPI_INT, rcounts,1,MPI_INT, MPI_COMM_WORLD); printf("myid= %d rcounts=",myid); for(i=0;i<numnodes;i++) printf("%d ",rcounts[i]); printf("\n"); mpi_err = MPI_Finalize();} void seed_random(int id){ srand((unsigned int)id);} void random_number(float *z){ int i; i=rand(); *z=(float)i/32767; }

  42. Output of previous code on 4 procs myid= 1 scounts= 6 2 4 6 myid= 1 rcounts= 7 2 7 3 myid= 2 scounts= 1 7 4 4 myid= 2 rcounts= 4 4 4 4 myid= 3 scounts= 6 3 4 3 myid= 3 rcounts= 7 6 4 3 myid= 0 scounts= 1 7 4 7 myid= 0 rcounts= 1 6 1 6 -------------------------------------------- 1 7 4 7 1 6 1 66 2 4 6 7 2 7 31 7 4 4 4 4 4 46 3 4 3 7 6 4 3

  43. The variable or “V” operators • A collection of very powerful but difficult to setup global communication routines • MPI_Gatherv: Gather different amounts of data from each processor to the root processor • MPI_Alltoallv: Send and receive different amounts of data form all processors • MPI_Allgatherv: Gather different amounts of data from each processor and send all data to each • MPI_Scatterv: Send different amounts of data to each processor from the root processor • We discuss MPI_Gatherv and MPI_Alltoallv

  44. MPI_Gatherv • C • int MPI_Gatherv (&sendbuf, sendcnts, sendtype, &recvbuf, &recvcnts, &rdispls,recvtype, comm); • Fortran • MPI_Gatherv (sendbuf, sendcnts, sendtype, recvbuf, recvcnts, rdispls, recvtype, comm, ierror) • Parameters: • Recvcnts is now an array • Rdispls is a displacement • See attached codes

  45. p0 p1 p2 p3 MPI_Gatherv rank 0 = root rank 1 rank 2 1 2 3 sendbuf 2 3 sendbuf 3 sendbuf recvcnts[0] 1 0 = rdispls[0] recvcnts[1] 2 1 = rdispls[1] 2 2 recvcnts[2] 3 3 = rdispls[2] 3 4 3 5 recvbuf A A B C D regular gather B C D

  46. MPI_Gatherv code Sample program: include ‘mpif.h’integer isend(3), irecv(6)integer ircnt(0:2), idisp(0:2)data ircnt/1,2,3/ idisp/0,1,3/call mpi_init(ierr)call mpi_comm_size(MPI_COMM_WORLD, nprocs,ierr)call mpi_comm_rank(MPI_COMM_WORLD,myrank,ierr)do I = 1,myrank+1isend(I) = myrank+1enddoiscnt = myrank + 1call MPI_GATHERV(isend,iscnt,MPI_INTEGER,irecv,ircnt,idisp,MPI_INTEGER & 0,MPI_COMM_WORLD, ierr)if (myrank .eq. 0) then print *, ‘irecv =‘, irecvendif call MPI_FINALIZE(ierr)end Sample execution:% bsub –q hpc –m ultra –I –n 3 ./a.out% 0: irecv = 1 2 2 3 3 3

  47. #include <mpi.h> #include <stdio.h> #include <stdlib.h> /*! This program shows how to use MPI_Gatherv. Each processor sends a ! different amount of data to the root processor. We use MPI_Gather ! first to tell the root how much data is going to be sent.*/ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end of globals */ void init_it(int *argc, char ***argv); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }

  48. int main(int argc,char *argv[]){ int *will_use; int *myray,*displacements,*counts,*allray; int size,mysize,i; init_it(&argc,&argv); mysize=myid+1; myray=(int*)malloc(mysize*sizeof(int)); for(i=0;i<mysize;i++) myray[i]=myid+1; /* counts and displacement arrays are only required on the root */ if(myid == mpi_root){ counts=(int*)malloc(numnodes*sizeof(int));

  49. displacements=(int*)malloc(numnodes*sizeof(int)); } /* we gather the counts to the root */ mpi_err = MPI_Gather((void*)myray,1,MPI_INT, (void*)counts, 1,MPI_INT, mpi_root,MPI_COMM_WORLD); /* calculate displacements and the size of the recv array */ if(myid == mpi_root){ displacements[0]=0; for( i=1;i<numnodes;i++){ displacements[i]=counts[i-1]+displacements[i-1]; } size=0; for(i=0;i< numnodes;i++) size=size+counts[i]; allray=(int*)malloc(size*sizeof(int));}

  50. /* different amounts of data from each processor */ /* is gathered to the root */ mpi_err = MPI_Gatherv(myray, mysize, MPI_INT, allray,counts,displacements,MPI_INT, mpi_root, MPI_COMM_WORLD); if(myid == mpi_root){ for(i=0;i<size;i++) printf("%d ",allray[i]); printf("\n"); } mpi_err = MPI_Finalize(); } ultra% bsub –q hpc –m ultra –I –n 3 ./a.out 1 2 2 3 3 3

More Related