440 likes | 469 Views
Learn about Message Passing Interface (MPI) for parallel programming, MPI environment, basic functions, types of communication, and how to get started with MPI. Includes example code and compilations.
E N D
Introduction to MPI programming Morris Law, SCID May 18/25, 2013
What is Message Passing Interface (MPI)? • Portable standard for communication • Processes can communicate through messages. • Each process is a separable program • All data is private
Multi-core programming • Currently, most CPUs has multiple cores that can be utilized easily by compiling with openmp support • Programmers no longer need to rewrite a sequential code but to add directives to instruct the compiler for parallelizing the code with openmp.
Openmp example /* * Sample program to test runtime of simple matrix multiply * with and without OpenMP on gcc-4.3.3-tdm1 (mingw) * compile with gcc –fopenmp * (c) 2009, RajorshiBiswas */ #include <stdio.h> #include <stdlib.h> #include <time.h> #include <assert.h> #include <omp.h> int main(intargc, char **argv) { inti,j,k; int n; double temp; double start, end, run; printf("Enter dimension ('N' for 'NxN' matrix) (100-2000): "); scanf("%d", &n); assert( n >= 100 && n <= 2000 ); int **arr1 = malloc( sizeof(int*) * n); int **arr2 = malloc( sizeof(int*) * n); int **arr3 = malloc( sizeof(int*) * n); for(i=0; i<n; ++i) { arr1[i] = malloc( sizeof(int) * n ); arr2[i] = malloc( sizeof(int) * n ); arr3[i] = malloc( sizeof(int) * n ); } printf("Populating array with random values...\n"); srand( time(NULL) ); for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { arr1[i][j] = (rand() % n); arr2[i][j] = (rand() % n); } } printf("Completed array init.\n"); printf("Crunching without OMP..."); fflush(stdout); start = omp_get_wtime(); for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { temp = 0; for(k=0; k<n; ++k) { temp += arr1[i][k] * arr2[k][j]; } arr3[i][j] = temp; } } end = omp_get_wtime(); printf(" took %f seconds.\n", end-start); printf("Crunching with OMP..."); fflush(stdout); start = omp_get_wtime(); #pragma omp parallel for private(i, j, k, temp) for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { temp = 0; for(k=0; k<n; ++k) { temp += arr1[i][k] * arr2[k][j]; } arr3[i][j] = temp; } } end = omp_get_wtime(); printf(" took %f seconds.\n", end-start); return 0; }
Compiling for openmp support • GCC gcc –fopenmp –o foo foo.c gfortran –fopenmp –o foo foo.f • Intel Compiler icc -openmp –o foo foo.c ifort –openmp –o foo foo.f • PGI Compiler pgcc -mp –o foo foo.c pgf90 –mp –o foo foo.f
What is Message Passing Interface (MPI)? • This is a library, not a language!! • Different compilers, but all must use the same libraries, i.e. MPICH, LAM, OPENMPI etc. • There are two versions now, MPI-1 and MPI-2 • Use standard sequential language. Fortran, C, C++, etc.
Basic Idea of Message Passing Interface (MPI) • MPI Environment • Initialize, manage, and terminate communication among processes • Communication between processes • Point to point communication, i.e. send, receive, etc. • Collective communication, i.e. broadcast, gather, etc. • Complicated data structures • Communicate the data effectively • e.g. matrices and memory
Is MPI Large or Small? • MPI is large • More than one hundred functions • But not necessarily a measure of complexity • MPI is small • Many parallel programs can be written with just 6 basic functions • MPI is just right • One can access flexibility when it is required • One need not master all MPI functions
When Use MPI? • You need a portable parallel program • You are writing a parallel library • You care about performance • You have a problem that can be solved in parallel ways
F77/F90, C/C++ MPI library calls • Fortran 77/90 uses subroutines • CALL is used to invoke the library call • Nothing is returned, the error code variable is the last argument • All variables are passed by reference • C/C++ uses functions • Just the name is used to invoke the library call • The function returns an integer value (an error code) • Variables are passed by value, unless otherwise specified
Types of Communication • Point to Point Communication • communication involving only two processes. • Collective Communication • communication that involves a group of processes.
Getting started with MPI • Create a file called “machines” • The content of “machines” (8 nodes): compute-0-0 compute-0-1 compute-0-2 … compute-0-7
MPI Commands • mpicc - compiles an mpi program mpicc -o foo foo.c mpif77 -o foo foo.f mpif90 -o foo foo.f90 • mpirun - start the execution of mpi programs mpirun -v -np 2 -machinefile machines foo
MPI Environment • Initialize • initialize environment • Finalize • terminate environment • Communicator • create default communication group for all processes • Version • establish version of MPI
MPI Environment • Total processes • spawn total processes • Rank/Process ID • assign identifier to each process • Timing Functions • MPI_Wtime, MPI_Wtick
MPI_INIT • Initializes the MPI environment • Assigns all spawned processes to MPI_COMM_WORLD, default comm. • C • int MPI_Init(argc,argv) • int *argc; • char **argv; • Input Parameters • argc - Pointer to the number of arguments • argv - Pointer to the argument vector • Fortran • CALL MPI_INIT(error_code) • int error_code – variable that gets set to an error code
MPI_FINALIZE • Terminates the MPI environment • C • int MPI_Finalize() • Fortran • CALL MPI_FINALIZE(error_code) • int error_code – variable that gets set to an error code
MPI_ABORT • This routine makes a “best attempt” to abort all tasks in the group of comm. • Usually used in error handling. • C • int MPI_Abort(comm, errorcode) • MPI_Comm comm • int errorcode • Input Parameters • comm - communicator of tasks to abort • errorcode - error code to return to invoking environment • Fortran • CALL MPI_ABORT(COMM, ERRORCODE, IERROR) • INTEGER COMM, ERRORCODE, IERROR
MPI_GET_VERSION • Get the version of currently used MPI • C • int MPI_Get_version(int *version, int *subversion) • Input Parameters • version – version of MPI • subversion – subversion of MPI • Fortran • CALL MPI_GET_VERSION(version, subversion, error_code) • int error_code – variable that gets set to an error code
MPI_COMM_SIZE • This finds the number of processes in a communication group • C • int MPI_Comm_size (comm, size) • MPI_Comm comm – MPI communication group; • int *size; • Input Parameter • comm - communicator (handle) • Output Parameter • size - number of processes in the group of comm (integer) • Fortran • CALL MPI_COMM_SIZE(comm, size, error_code) • int error_code – variable that gets set to an error code • Using MPI_COMM_WORLD as comm will return the total number of processes started
MPI_COMM_RANK • This gives the rank/identification number of a process in a communication group • C • int MPI_Comm_rank ( comm, rank ) • MPI_Comm comm; • int *rank; • Input Parameter • comm - communicator (handle) • Output Parameter • rank – rank/id number of the process who made the call (integer) • Fortran • CALL MPI_COMM_RANK(comm, rank, error_code) • int error_code – variable that gets set to an error code • Using MPI_COMM_WORLD as comm will return the rank of the process in relation to all processes that were started
Timing Functions – MPI_WTIME • MPI_Wtime() - returns a floating point number of seconds, representing elapsed wall-clock time. • C • double MPI_Wtime(void) • Fortran • DOUBLE PRECISION MPI_WTIME() • The times returned are local to the node/process that made the call.
Timing Functions – MPI_WTICK • MPI_Wtick() - returns a double precision number of seconds between successive clock ticks. • C • double MPI_Wtick(void) • Fortran • DOUBLE PRECISION MPI_WTICK() • The times returned are local to the node/process that made the call.
Hello World 1 • Echo the MPI version • MPI Functions Used • MPI_Init • MPI_Get_version • MPI_Finalize
Hello World 1 (C) #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int version, subversion; MPI_Init(&argc, &argv); MPI_Get_version(&version, &subversion); printf("Hello world!\n"); printf("Your MPI Version is: %d.%d\n", version, subversion); MPI_Finalize(); return(0); }
Hello World 1 (Fortran) program main include 'mpif.h' integer ierr, version, subversion call MPI_INIT(ierr) call MPI_GET_VERSION(version, subversion, ierr) print *, 'Hello world!' print *, 'Your MPI Version is: ', version, '.', subversion call MPI_FINALIZE(ierr) end
Hello World 2 • Echo the process rank and the total number of process in the group • MPI Functions Used • MPI_Init • MPI_Comm_rank • MPI_Comm_size • MPI_Finalize
Hello World 2 (C) #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf(”Hello world! I am %d of %d\n”, rank, size); MPI_Finalize(); return(0); }
Hello World 2 (Fortran) program main include 'mpif.h' integer rank, size, ierr call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr) print *, 'Hello world! I am ', rank, ' of ', size call MPI_FINALIZE(ierr) end
Parallelization example 1: serial-pi.c #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); } 35
Parallelizing serial-pi.c into mpi-pi.c:-Step 1: Adding MPI environment #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; MPI_Init(&argc,&argv); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); MPI_Finalize(); }
Parallelizing serial-pi.c into mpi-pi.c :-Step 2: Adding variables to print ranks #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",pi, rank, size); MPI_Finalize(); }
Parallelizing serial-pi.c into mpi-pi.c :-Step 3: divide the workload #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",mypi, rank, size); MPI_Finalize(); }
Parallelizing serial-pi.c into mpi-pi.c :-Step 4: collect partial results #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (rank==0) printf("Est Pi= %f, \n",pi); MPI_Finalize(); }
Compile and run mpi program $ mpicc –o mpi-pi mpi-pi.c $ mpirun -np 4 -machinefile machines mpi-pi
Parallelization example 2: serial-mc-pi.c #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q; time_t now; in = 0; srand(time(&now)); printf("Input no of samples : "); scanf("%ld",&n); for (i=0;i<n;i++) { x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) { in++; } } q = ((double)4.0)*in/n; printf("pi = %.20lf\n",q); printf("rmse = %.20lf\n",sqrt(( (double) q*(4-q))/n)); } 2r
Parallelization example 2: mpi-mc-pi.c 2r #include "mpi.h" #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q,Q; time_t now; int rank,size; MPI_Init(&argc, &argv); in = 0; MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); srand(time(&now)+rank); if (rank==0) { printf("Input no of samples : "); scanf("%ld",&n); } MPI_Bcast(&n,1,MPI_LONG,0,MPI_COMM_WORLD); for (i=0;i<n;i++) { x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) { in++; } } q = ((double)4.0)*in/n; MPI_Reduce(&q,&Q,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); Q = Q / size; if (rank==0) { printf("pi = %.20lf\n",Q); printf("rmse = %.20lf\n",sqrt(( (double) Q*(4-Q))/n/size)); } MPI_Finalize(); }
Compile and run mpi-mc-pi $ mpicc –o mpi-mc-pi mpi-mc-pi.c $ mpirun -np 4 -machinefile machines mpi-mc-pi