1 / 47

CS 484

This article explores the basics of Message Passing Interface (MPI) in multi-processor systems, covering concepts such as point-to-point and collective communication operations, communicators, and the use of MPI functions. C code examples are provided.

marroquin
Download Presentation

CS 484

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 484

  2. Message Passing • Based on multi-processor • Set of independent processors • Connected via some communication net • All communication between processes is done via a message sent from one to the other

  3. MPI • Message Passing Interface • Computation is made of: • One or more processes • Communicate by calling library routines • MIMD programming model • SPMD most common.

  4. MPI • Processes use point-to-point communication operations • Collective communication operations are also available. • Communication can be modularized by the use of communicators. • MPI_COMM_WORLD is the base. • Used to identify subsets of processors

  5. MPI • Complex, but most problems can be solved using the 6 basic functions. • MPI_Init • MPI_Finalize • MPI_Comm_size • MPI_Comm_rank • MPI_Send • MPI_Recv

  6. MPI Basics • Most all calls require a communicator handle as an argument. • MPI_COMM_WORLD • MPI_Init and MPI_Finalize • don’t require a communicator handle • used to begin and end and MPI program • MUST be called to begin and end

  7. MPI Basics • MPI_Comm_size • determines the number of processors in the communicator group • MPI_Comm_rank • determines the integer identifier assigned to the current process • zero based

  8. MPI Basics #include <stdio.h> #include <mpi.h> main(int argc, char *argv[]) { int iproc, nproc; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nproc); MPI_Comm_rank(MPI_COMM_WORLD, &iproc); printf("I am processor %d of %d\n", iproc, nproc); MPI_Finalize(); }

  9. MPI Communication • MPI_Send • Sends an array of a given type • Requires a destination node, size, and type • MPI_Recv • Receives an array of a given type • Same requirements as MPI_Send • Extra parameter • MPI_Status variable.

  10. MPI Basics • Made for both FORTRAN and C • Standards for C • MPI_ prefix to all calls • First letter of function name is capitalized • Returns MPI_SUCCESS or error code • MPI_Status structure • MPI data types for each C type • OUT parameters passed using & operator

  11. Using MPI • Based on rsh or ssh • requires a .rhosts file or ssh key setup • hostname login • Path to compiler (CS open labs) • MPI_HOME /users/faculty/snell/mpich • MPI_CC MPI_HOME/bin/mpicc • Marylou5 • Use mpicc • mpicc hello.c –o hello

  12. Using MPI • Write program • Compile using mpicc • Write process file (linux cluster) • host nprocs full_path_to_prog • 0 for nprocs on first line, 1 for all others • Run program (linux cluster) • prog -p4pg process_file args • mpirun –np #procs –machinefile machines prog • Run program (scheduled on marylou5 using pbs) • mpirun -np #procs -machinefile $PBS_NODEFILE prog • mpiexec prog

  13. #include “mpi.h” #include <stdio.h> #include <math.h> #define MAXSIZE 1000 void main(int argc, char *argv) { int myid, numprocs; int data[MAXSIZE], i, x, low, high, myresult, result; char fn[255]; char *fp; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { /* Open input file and initialize data */ strcpy(fn,getenv(“HOME”)); strcat(fn,”/MPI/rand_data.txt”); if ((fp = fopen(fn,”r”)) == NULL) { printf(“Can’t open the input file: %s\n\n”, fn); exit(1); } for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]); } /* broadcast data */ MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* Add my portion Of data */ x = n/nproc; low = myid * x; high = low + x; for(i = low; i < high; i++) myresult += data[i]; printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */ MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf(“The sum is %d.\n”, result); MPI_Finalize(); }

  14. MPI • Message Passing programs are non-deterministic because of concurrency • Consider 2 processes sending messages to third • MPI only guarantees that 2 messages sent from a single process to another will arrive in order. • It is the programmer's responsibility to ensure computation determinism

  15. MPI & Determinism • MPI • A Process may specify the source of the message • A Process may specify the type of message • Non-Determinism • MPI_ANY_SOURCE or MPI_ANY_TAG

  16. Example for (n = 0; n < nproc/2; n++) { MPI_Send(buff, BSIZE, MPI_FLOAT, rnbor, 1, MPI_COMM_WORLD); MPI_Recv(buff, BSIZE, MPI_FLOAT, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &status); /* Process the data */ }

  17. Global Operations • Coordinated communication involving multiple processes. • Can be implemented by the programmer using sends and receives • For convenience, MPI provides a suite of collective communication functions. • All participating processes must call the same function.

  18. Collective Communication • Barrier • Synchronize all processes • Broadcast • Gather • Gather data from all processes to one process • Scatter • Reduction • Global sums, products, etc.

  19. Collective Communication

  20. Distribute Problem Size Distribute Input data Exchange Boundary values Find Max Error Collect Results

  21. MPI_Reduce MPI_Reduce(inbuf, outbuf, count, type, op, root, comm)

  22. MPI_Reduce

  23. MPI_Allreduce MPI_Allreduce(inbuf, outbuf, count, type, op, comm)

  24. MPI Collective Routines • Several routines: MPI_ALLGATHER MPI_ALLGATHERV MPI_BCAST MPI_ALLTOALL MPI_ALLTOALLV MPI_REDUCE MPI_GATHER MPI_GATHERV MPI_SCATTER MPI_REDUCE_SCATTER MPI_SCAN MPI_SCATTERV MPI_ALLREDUCE • Allversions deliver results to all participating processes • “V” versions allow the chunks to have different sizes • MPI_ALLREDUCE, MPI_REDUCE, MPI_REDUCE_SCATTER, and MPI_SCAN take both built-in and user-defined combination functions

  25. Built-In Collective Computation Operations

  26. Example: PI in C -1 #include "mpi.h" #include <math.h> int main(int argc, char *argv[]) {int done = 0, n, myid, numprocs, i, rc;double PI25DT = 3.141592653589793238462643;double mypi, pi, h, sum, x, a;MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);while (!done) { if (myid == 0) { printf("Enter the number of intervals: (0 quits) "); scanf("%d",&n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break;

  27. Example: PI in C - 2 h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT));}MPI_Finalize(); return 0; }

  28. Some other things

  29. MPI Datatypes • Data in messages are described by: • Address, Count, Datatype • MPI predefines many datatypes • MPI_INT, MPI_FLOAT, MPI_DOUBLE, etc. • There is an analog for each primitive type • Can also construct custom data types for structured data

  30. MPI_Recv • Blocks until message is received • Message is matched based on source & tag • The MPI_Status argument gets filled with information about the message • Source & Tag • Receiving fewer elements than specified is OK • Receiving more elements is an error • Use MPI_Get_count to get number of elements received

  31. MPI_Recv int recvd_tag, recvd_from, recvd_count; MPI_Status status; MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG, ..., &status ) recvd_tag = status.MPI_TAG; recvd_from = status.MPI_SOURCE; MPI_Get_count( &status, datatype, &recvd_count );

  32. Non-blocking communication • MPI_Send and MPI_Recv are blocking • MPI_Send does not complete until the buffer is available to be modified • MPI_Recv does not complete until the buffer is filled • Blocking communication can lead to deadlocks for(int p = 0; p < nproc; p++) { MPI_Send(… p ….) MPI_Recv(… p ….) }

  33. Non-blocking communiction • MPI_Isend & MPI_Irecv return immediately (non-blocking) MPI_Request request; MPI_Status status; MPI_Isend( start, count, datatype, dest, tag, comm, &request ) MPI_Irecv( start, count, datatype, src, tag, comm, &request ) MPI_WAIT( &request, &status ) • Used to overlap communication with computation • Anywhere you use MPI_Send or MPI_Recv, you can use the pair of MPI_Isend/MPI_Wait or MPI_Irecv/MPI_Wait • Also can use MPI_Waitall, MPI_Waitany, MPI_Waitsome • Can also check to see if you have any messages without actually receiving them – MPI_Probe & MPI_Iprobe • MPI_Probe blocks until there is a message – MPI_Iprobe sets a flag

  34. Communicators • All MPI communication is based on a communicator which contains a context and a group • Contexts define a safe communication space for message-passing • Contexts can be viewed as system-managed tags • Contexts allow different libraries to co-exist • The group is just a set of processes • Processes are always referred to by unique rank in group

  35. Uses of MPI_COMM_WORLD • Contains all processes available at the time the program was started • Provides initial safe communication space • Simple programs communicate with MPI_COMM_WORLD • Even complex programs will use MPI_COMM_WORLD for most communications • Complex programs duplicate and subdivide copies of MPI_COMM_WORLD • Provides a global communicator for forming smaller groups or subsets of processors for specific tasks 4 0 1 2 3 5 6 7 MPI_COMM_WORLD

  36. Subdividing a Communicator with MPI_COMM_SPLIT • MPI_COMM_SPLIT partitions the group associated with the given communicator into disjoint subgroups • Each subgroup contains all processes having the same value for the argument color • Within each subgroup, processes are ranked in the order defined by the value of the argument key, with ties broken according to their rank in old communicator • intMPI_Comm_split( MPI_Commcomm, int color, • int key, MPI_Comm *newcomm) • MPI_COMM_SPLIT( COMM, COLOR, KEY, NEWCOMM, IERR ) • INTEGER COMM, COLOR, KEY, NEWCOMM, IERR

  37. Subdividing a Communicator • To divide a communicator into two non-overlapping groups • color = (rank < size/2) ? 0 : 1 ; • MPI_Comm_split(comm, color, 0, &newcomm) ; comm 4 0 1 2 3 5 6 7 0 1 2 3 0 1 2 3 newcomm newcomm

  38. Subdividing a Communicator • To divide a communicator such that • all processes with even ranks are in one group • all processes with odd ranks are in the other group • maintain the reverse order by rank • color = (rank % 2 == 0) ? 0 : 1 ; • key = size - rank ; • MPI_Comm_split(comm, color, key, &newcomm) ; comm 4 0 1 2 3 5 6 7 5 4 3 2 6 1 7 0 0 1 2 3 0 1 2 3 newcomm newcomm

  39. program main include 'mpif.h' integer ierr, row_comm, col_comm integer myrank, size, P, Q, myrow, mycol P = 4 Q = 3 call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr) C Determine row and column position myrow = myrank/Q mycol = mod(myrank,Q) C Split comm into row and column comms call MPI_Comm_split(MPI_COMM_WORLD, myrow, mycol, row_comm, ierr) call MPI_Comm_split(MPI_COMM_WORLD, mycol, myrow, col_comm, ierr) print*, "My coordinates are[",myrank,"] ",myrow. mycol call MPI_Finalize(ierr) stop end

  40. 0 8 7 6 5 4 9 3 2 11 1 10 (1,0) (2,1) (0,0) (0,1) (0,2) (1,1) (3,1) (2,0) (2,2) (3,0) (3,2) (1,2) MPI_COMM_WORLD row_comm col_comm

  41. Debugging

  42. An ounce of prevention… • Defensive programming • Check function return codes • Verify send and receive sizes • Incremental programming • Modular programming • Test modules – keep test code in place • Identify all shared data and think carefully about how it is accessed • Correctness first – then speed

  43. Debugging • Characterize the bug • Run code serially • Run in parallel on one core (2-4 processes) • Run in parallel (2-4 processes on 2-4 cores) • Play around with inputs and other data and data sizes • Find smallest data size that exposes the bug • Remove as much non-determinism as you can • Print statements – use stderr (non buffered) • Before and after communication or shared variable access • Print all information – source, sizes, data, tag, etc. • Identify process number – first thing in print (helps sorting) • Leave the prints in your code - #ifdef

  44. Debugging • Learn about C constructs __FILE__, __LINE__, and __FUNCTION__ • Make one logical change at a time and then test • Learn how to attach debuggers • You will probably need some sort of stall code – ie. Wait for input on master then do a barrier – all others just do barrier

  45. Common problems • Not all processes call collective call • Be very careful about putting collective calls inside conditionals • Be sure the communicator is correct • Deadlock (everybody on recv) • Use non-blocking calls • Use MPI_Sendrecv • Process waiting for data that is never sent • Use collective calls where you can • Use simple communication patterns

  46. Best Advice • Program incrementally and modularly • Characterize the bug and leave yourself time to walk away from it and think about it • Never underestimate the value of a second set of eyes • Sometimes just explaining your code to someone else helps you help yourself

More Related