1 / 44

Introduction to MPI programming

Learn about Message Passing Interface (MPI) for parallel programming, MPI environment, basic functions, types of communication, and how to get started with MPI. Includes example code and compilations.

Download Presentation

Introduction to MPI programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to MPI programming Morris Law, SCID May 18/25, 2013

  2. What is Message Passing Interface (MPI)? • Portable standard for communication • Processes can communicate through messages. • Each process is a separable program • All data is private

  3. Multi-core programming • Currently, most CPUs has multiple cores that can be utilized easily by compiling with openmp support • Programmers no longer need to rewrite a sequential code but to add directives to instruct the compiler for parallelizing the code with openmp.

  4. Openmp example /* * Sample program to test runtime of simple matrix multiply * with and without OpenMP on gcc-4.3.3-tdm1 (mingw) * compile with gcc –fopenmp * (c) 2009, RajorshiBiswas */ #include <stdio.h> #include <stdlib.h> #include <time.h> #include <assert.h> #include <omp.h> int main(intargc, char **argv) { inti,j,k; int n; double temp; double start, end, run; printf("Enter dimension ('N' for 'NxN' matrix) (100-2000): "); scanf("%d", &n); assert( n >= 100 && n <= 2000 ); int **arr1 = malloc( sizeof(int*) * n); int **arr2 = malloc( sizeof(int*) * n); int **arr3 = malloc( sizeof(int*) * n); for(i=0; i<n; ++i) { arr1[i] = malloc( sizeof(int) * n ); arr2[i] = malloc( sizeof(int) * n ); arr3[i] = malloc( sizeof(int) * n ); } printf("Populating array with random values...\n"); srand( time(NULL) ); for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { arr1[i][j] = (rand() % n); arr2[i][j] = (rand() % n); } } printf("Completed array init.\n"); printf("Crunching without OMP..."); fflush(stdout); start = omp_get_wtime(); for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { temp = 0; for(k=0; k<n; ++k) { temp += arr1[i][k] * arr2[k][j]; } arr3[i][j] = temp; } } end = omp_get_wtime(); printf(" took %f seconds.\n", end-start); printf("Crunching with OMP..."); fflush(stdout); start = omp_get_wtime(); #pragma omp parallel for private(i, j, k, temp) for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { temp = 0; for(k=0; k<n; ++k) { temp += arr1[i][k] * arr2[k][j]; } arr3[i][j] = temp; } } end = omp_get_wtime(); printf(" took %f seconds.\n", end-start); return 0; }

  5. Compiling for openmp support • GCC gcc –fopenmp –o foo foo.c gfortran –fopenmp –o foo foo.f • Intel Compiler icc -openmp –o foo foo.c ifort –openmp –o foo foo.f • PGI Compiler pgcc -mp –o foo foo.c pgf90 –mp –o foo foo.f

  6. What is Message Passing Interface (MPI)? • This is a library, not a language!! • Different compilers, but all must use the same libraries, i.e. MPICH, LAM, OPENMPI etc. • There are two versions now, MPI-1 and MPI-2 • Use standard sequential language. Fortran, C, C++, etc.

  7. Basic Idea of Message Passing Interface (MPI) • MPI Environment • Initialize, manage, and terminate communication among processes • Communication between processes • Point to point communication, i.e. send, receive, etc. • Collective communication, i.e. broadcast, gather, etc. • Complicated data structures • Communicate the data effectively • e.g. matrices and memory

  8. Is MPI Large or Small? • MPI is large • More than one hundred functions • But not necessarily a measure of complexity • MPI is small • Many parallel programs can be written with just 6 basic functions • MPI is just right • One can access flexibility when it is required • One need not master all MPI functions

  9. When Use MPI? • You need a portable parallel program • You are writing a parallel library • You care about performance • You have a problem that can be solved in parallel ways

  10. F77/F90, C/C++ MPI library calls • Fortran 77/90 uses subroutines • CALL is used to invoke the library call • Nothing is returned, the error code variable is the last argument • All variables are passed by reference • C/C++ uses functions • Just the name is used to invoke the library call • The function returns an integer value (an error code) • Variables are passed by value, unless otherwise specified

  11. Types of Communication • Point to Point Communication • communication involving only two processes. • Collective Communication • communication that involves a group of processes.

  12. Implementation of MPI

  13. Getting started with MPI • Create a file called “machines” • The content of “machines” (8 nodes): compute-0-0 compute-0-1 compute-0-2 … compute-0-7

  14. MPI Commands • mpicc - compiles an mpi program mpicc -o foo foo.c mpif77 -o foo foo.f mpif90 -o foo foo.f90 • mpirun - start the execution of mpi programs mpirun -v -np 2 -machinefile machines foo

  15. Basic MPI Functions

  16. MPI Environment • Initialize • initialize environment • Finalize • terminate environment • Communicator • create default communication group for all processes • Version • establish version of MPI

  17. MPI Environment • Total processes • spawn total processes • Rank/Process ID • assign identifier to each process • Timing Functions • MPI_Wtime, MPI_Wtick

  18. MPI_INIT • Initializes the MPI environment • Assigns all spawned processes to MPI_COMM_WORLD, default comm. • C • int MPI_Init(argc,argv) • int *argc; • char **argv; • Input Parameters • argc - Pointer to the number of arguments • argv - Pointer to the argument vector • Fortran • CALL MPI_INIT(error_code) • int error_code – variable that gets set to an error code

  19. MPI_FINALIZE • Terminates the MPI environment • C • int MPI_Finalize() • Fortran • CALL MPI_FINALIZE(error_code) • int error_code – variable that gets set to an error code

  20. MPI_ABORT • This routine makes a “best attempt” to abort all tasks in the group of comm. • Usually used in error handling. • C • int MPI_Abort(comm, errorcode) • MPI_Comm comm • int errorcode • Input Parameters • comm - communicator of tasks to abort • errorcode - error code to return to invoking environment • Fortran • CALL MPI_ABORT(COMM, ERRORCODE, IERROR) • INTEGER COMM, ERRORCODE, IERROR

  21. MPI_GET_VERSION • Get the version of currently used MPI • C • int MPI_Get_version(int *version, int *subversion) • Input Parameters • version – version of MPI • subversion – subversion of MPI • Fortran • CALL MPI_GET_VERSION(version, subversion, error_code) • int error_code – variable that gets set to an error code

  22. MPI_COMM_SIZE • This finds the number of processes in a communication group • C • int MPI_Comm_size (comm, size) • MPI_Comm comm – MPI communication group; • int *size; • Input Parameter • comm - communicator (handle) • Output Parameter • size - number of processes in the group of comm (integer) • Fortran • CALL MPI_COMM_SIZE(comm, size, error_code) • int error_code – variable that gets set to an error code • Using MPI_COMM_WORLD as comm will return the total number of processes started

  23. MPI_COMM_RANK • This gives the rank/identification number of a process in a communication group • C • int MPI_Comm_rank ( comm, rank ) • MPI_Comm comm; • int *rank; • Input Parameter • comm - communicator (handle) • Output Parameter • rank – rank/id number of the process who made the call (integer) • Fortran • CALL MPI_COMM_RANK(comm, rank, error_code) • int error_code – variable that gets set to an error code • Using MPI_COMM_WORLD as comm will return the rank of the process in relation to all processes that were started

  24. Timing Functions – MPI_WTIME • MPI_Wtime() - returns a floating point number of seconds, representing elapsed wall-clock time. • C • double MPI_Wtime(void) • Fortran • DOUBLE PRECISION MPI_WTIME() • The times returned are local to the node/process that made the call.

  25. Timing Functions – MPI_WTICK • MPI_Wtick() - returns a double precision number of seconds between successive clock ticks. • C • double MPI_Wtick(void) • Fortran • DOUBLE PRECISION MPI_WTICK() • The times returned are local to the node/process that made the call.

  26. Hello World 1 • Echo the MPI version • MPI Functions Used • MPI_Init • MPI_Get_version • MPI_Finalize

  27. Hello World 1 (C) #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int version, subversion; MPI_Init(&argc, &argv); MPI_Get_version(&version, &subversion); printf("Hello world!\n"); printf("Your MPI Version is: %d.%d\n", version, subversion); MPI_Finalize(); return(0); }

  28. Hello World 1 (Fortran) program main include 'mpif.h' integer ierr, version, subversion call MPI_INIT(ierr) call MPI_GET_VERSION(version, subversion, ierr) print *, 'Hello world!' print *, 'Your MPI Version is: ', version, '.', subversion call MPI_FINALIZE(ierr) end

  29. Hello World 2 • Echo the process rank and the total number of process in the group • MPI Functions Used • MPI_Init • MPI_Comm_rank • MPI_Comm_size • MPI_Finalize

  30. Hello World 2 (C) #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf(”Hello world! I am %d of %d\n”, rank, size); MPI_Finalize(); return(0); }

  31. Hello World 2 (Fortran) program main include 'mpif.h' integer rank, size, ierr call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr) print *, 'Hello world! I am ', rank, ' of ', size call MPI_FINALIZE(ierr) end

  32. MPI C Datatypes

  33. MPI C Datatypes

  34. MPI Fortran Datatypes

  35. Parallelization example 1: serial-pi.c #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); } 35

  36. Parallelizing serial-pi.c into mpi-pi.c:-Step 1: Adding MPI environment #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; MPI_Init(&argc,&argv); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); MPI_Finalize(); }

  37. Parallelizing serial-pi.c into mpi-pi.c :-Step 2: Adding variables to print ranks #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",pi, rank, size); MPI_Finalize(); }

  38. Parallelizing serial-pi.c into mpi-pi.c :-Step 3: divide the workload #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",mypi, rank, size); MPI_Finalize(); }

  39. Parallelizing serial-pi.c into mpi-pi.c :-Step 4: collect partial results #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (rank==0) printf("Est Pi= %f, \n",pi); MPI_Finalize(); }

  40. Compile and run mpi program $ mpicc –o mpi-pi mpi-pi.c $ mpirun -np 4 -machinefile machines mpi-pi

  41. Parallelization example 2: serial-mc-pi.c #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q; time_t now; in = 0; srand(time(&now)); printf("Input no of samples : "); scanf("%ld",&n); for (i=0;i<n;i++) { x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) { in++; } } q = ((double)4.0)*in/n; printf("pi = %.20lf\n",q); printf("rmse = %.20lf\n",sqrt(( (double) q*(4-q))/n)); } 2r

  42. Parallelization example 2: mpi-mc-pi.c 2r #include "mpi.h" #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q,Q; time_t now; int rank,size; MPI_Init(&argc, &argv); in = 0; MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); srand(time(&now)+rank); if (rank==0) { printf("Input no of samples : "); scanf("%ld",&n); } MPI_Bcast(&n,1,MPI_LONG,0,MPI_COMM_WORLD); for (i=0;i<n;i++) { x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) { in++; } } q = ((double)4.0)*in/n; MPI_Reduce(&q,&Q,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); Q = Q / size; if (rank==0) { printf("pi = %.20lf\n",Q); printf("rmse = %.20lf\n",sqrt(( (double) Q*(4-Q))/n/size)); } MPI_Finalize(); }

  43. Compile and run mpi-mc-pi $ mpicc –o mpi-mc-pi mpi-mc-pi.c $ mpirun -np 4 -machinefile machines mpi-mc-pi

  44. The End

More Related