1 / 52

An Introduction to Parallel Programming with MPI

An Introduction to Parallel Programming with MPI. March 22, 24, 29, 31 2005 David Adams – daadams3@vt.edu http://research.cs.vt.edu/lasca/schedule. MPI and Classical References. MPI M. Snir, W. Gropp, MPI: The Complete Reference (2-volume set), MIT Press, MA, (1998). Parallel Computing

adina
Download Presentation

An Introduction to Parallel Programming with MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Parallel Programming with MPI March 22, 24, 29, 31 2005 David Adams – daadams3@vt.edu http://research.cs.vt.edu/lasca/schedule

  2. MPI and Classical References • MPI • M. Snir, W. Gropp, MPI: The Complete Reference (2-volume set), MIT Press, MA, (1998). • Parallel Computing • D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation, Prentice-Hall, Englewood Cliffs, NJ, (1989). • M. J. Quinn, Designing Efficient Algorithms for Parallel Computers, Mcgraw-Hill, NY, (1987).

  3. Outline • Disclaimers • Overview of basic parallel programming on a cluster with the goals of MPI • Batch system interaction • Startup procedures • Quick review • Blocking message passing • Non-blocking message passing • Collective communications

  4. Review • Messages are the only way processors can pass information. • MPI hides the low level details of message transport leaving the user to specify only the message logic. • Parallel algorithms are built from identifying the concurrency opportunities in the problem itself, not in the serial algorithm. • Communication is slow. • Partitioning and pipelining are two primary methods for exploiting concurrency. • To make good use of the hardware we want to balance the computational load across all processors and maintain a compute bound process rather than a communication bound process.

  5. More Review • MPI messages specify a starting point, a length, and data type information. • MPI messages are read from contiguous memory. • These functions will generally appear in all MPI programs: • MPI_INIT MPI_FINALIZE • MPI_COMM_SIZE MPI_COMM_RANK • MPI_COMM_WORLD is the global communicator available at the start of all MPI runs.

  6. Hello WorldFortran90 • PROGRAM Hello_World • IMPLICIT NONE • INCLUDE 'mpif.h' • INTEGER :: ierr_p, rank_p, size_p • INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status_p • CALL MPI_INIT(ierr_p) • CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank_p, ierr_p) • CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size_p, ierr_p) • IF (rank_p==0) THEN • WRITE(*,*) ‘Hello world! I am process 0 and I am special!’ • ELSE • WRITE(*,*) ‘Hello world! I am process’, rank_p • END IF • CALL MPI_FINALIZE(ierr_p) • END PROGRAM Hello_World

  7. Hello WorldC (case sensitive) • #include <stdio.h> • #include <mpi.h> • int main (int argc, char **argv) • { • int rank_p,size_p; • MPI_Init(&argc, &argv); • MPI_Comm_rank(MPI_COMM_WORLD, &rank_p); • MPI_Comm_size(MPI_COMM_WORLD, &size_p); • if (rank_p==0) { • printf("%d: Hello World! I am special!\n", rank_p); • } • else { • printf("%d: Hello World!\n", size_p); • } • MPI_Finalize(); • }

  8. MPI Messages • Messages are non-overtaking. • All MPI messages are completed in two parts: • Send • Can be blocking or non-blocking. • Identifies the destination, data type and length, and a message type identifier (tag). • Identifies to MPI a space in memory specifically reserved for the sending of this message. • Receive • Can be blocking or non-blocking • Identifies the source, data type and length, and a message type identifier (tag). • Identifies to MPI a space in memory specifically reserved for the completion of this message.

  9. Message Semantics(Modes) • Standard • The completion of the send does not necessarily mean that the matching receive has started, and no assumption should be made in the application program about whether the out-going data is buffered. • All buffering is made at the discretion of your MPI implementation. • Completion of an operation simply means that the message buffer space can now be modified safely again. • Buffered • Synchronous • Ready

  10. Message Semantics(Modes) • Standard • Buffered (not recommended) • The user can guarantee that a certain amount of buffer space is available. • The catch is that the space must be explicitly provided by the application program. • Making sure the buffer space does not become full is completely the user’s responsibility. • Synchronous • Ready

  11. Message Semantics(Modes) • Standard • Buffered (not recommended) • Synchronous • A rendezvous semantic between sender and receiver is used. • Completion of a send signals that the receive has at least started. • Ready

  12. Message Semantics(Modes) • Standard • Buffered (not recommended) • Synchronous • Ready (not recommended) • Allows the user to exploit extra knowledge to simplify the protocol and potentially achieve higher performance. • In a ready-mode send, the user asserts that the matching receive already has been posted.

  13. Blocking Message Passing(SEND) • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • Performs a standard-mode, blocking send. • Blocking means that the code can not continue until the send has completed. • Completion of the send means either that the data has been buffered non-locally or locally and that the message buffer is now free to modify. • Completion implies nothing about the matching receive.

  14. Buffer • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • BUF is an array. It can be an array of one object but it must be an array. • The definition • INTEGER :: X • DOES NOT EQUAL • INTEGER :: X(1)

  15. Buffer • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • BUF is the parameter in which MPI determines the starting point for the memory space to be allocated to this message. • Recall that this memory space must be contiguous and allocatable arrays in Fortran90 are not necessarily contiguous. Also, array segments are certainly not in general contiguous.

  16. Buffer • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • Until the send is complete the data inside BUF is undefined. • Any attempt to change the data in BUF before the send completes is also an undefined operation (though possible). • Once a send operation begins it is the users job to see that no modifications to BUF are made. • Completion of the send ensures the user that it is safe to modify the contents of BUF again.

  17. DATATYPE • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • DATATYPE is an MPI specific data type corresponding to the type of data stored in BUF. • An array of integers would be sent using the MPI_INTEGER data type • An array of logical variables would be sent using the MPI_LOGICAL data type • etc….

  18. MPI Types in Fortran 77 • MPI_INTEGER – INTEGER • MPI_REAL – REAL • MPI_DOUBLE_PRECISION – DOUBLE PRECISION • MPI_COMPLEX – COMPLEX • MPI_LOGICAL – LOGICAL • MPI_CHARACTER – CHARACTER(1) • MPI_BYTE • MPI_PACKED

  19. MPI types in C • MPI_CHAR – signed char • MPI_SHORT – signed short int • MPI_INT – signed int • MPI_LONG – signed long int • MPI_UNSIGNED_CHAR – unsigned short int • MPI_UNSIGNED – unsigned int • MPI_UNSIGNED_LONG – unsigned long int • MPI_FLOAT – float • MPI_DOUBLE – double • MPI_LONG_DOUBLE – long double • MPI_BYTE • MPI_PACKED

  20. COUNT • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • COUNT specifies the number of entries of type DATATYPE in the buffer BUF. • From the combined information of COUNT, DATATYPE, and BUF, MPI can determine the starting point in memory for the message and the number of bytes to move.

  21. Communicator • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • COMM provides MPI with the reference point for the communication domain applied to this send. • For most MPI programs MPI_COMM_WORLD will be sufficient as the argument for this parameter.

  22. DESTINATION • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • DEST is an integer representing the rank of the process I am trying to send a message to. • The rank value is with respect to the communicator in the COMM parameter. • For MPI_COMM_WORLD, the value in DEST is the absolute rank of the processor you are trying to reach.

  23. TAG • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • The TAG parameter is an integer between 0 and some upper bound where the upper bound is machine dependent. The value for the upper bound is found in the attribute MPI_TAG_UB. • This integer value can be used to distinguish messages since send-receive pairs will only match if their TAG values also match.

  24. IERROR • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR • Assuming everything is working as planned then the value of IERROR on exit will be MPI_SUCCESS. • Values not equal to MPI_SUCCESS indicate some error but these values are implementation specific.

  25. Send Modes • Standard • MPI_SEND • Buffered (not recommended) • MPI_BSEND • Synchronous • MPI_SSEND • Ready (not recommended) • MPI_RSEND

  26. Blocking Message Passing(RECEIVE) • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR, STATUS(MPI_STATUS_SIZE) • Performs a standard-mode, blocking receive. • Blocking means that the code can not continue until the receive has completed. • Completion of the receive means that the data has been placed into the message buffer locally and that the message buffer is now safe to modify or use. • Completion implies nothing about the completion of the matching send (except that the send has started).

  27. BUFFER, DATATYPE, COMM, and IERROR • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR, STATUS(MPI_STATUS_SIZE) • The parameters BUF, DATATYPE and IERROR follow the same rules as that of the send. • Send receive pairs will only match if their SOURCE/DEST, TAG, and COMM information match.

  28. COUNT • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR, STATUS(MPI_STATUS_SIZE) • Like in the send operation, the COUNT parameter indicates the number of entries of type DATATYPE in BUF. • The COUNT values of a send-receive pair, however, do not need to match. • It is the user’s responsibility to see that the buffer on the receiving end is big enough to store the incoming message. An overflow error would be returned in IERROR in the case when BUF is too small.

  29. Source • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR, STATUS(MPI_STATUS_SIZE) • SOURCE is an integer representing the rank of the process I am willing to receive a message from. • The rank value is with respect to the communicator in the COMM parameter. • For MPI_COMM_WORLD, the value in SOURCE is the absolute rank of the processor you are willing to receive from. • The receiver can specify a wildcard value for SOURCE (MPI_ANY_SOURCE) indicating that any source is acceptable as long as the TAG and COMM parameters match.

  30. Source • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR, STATUS(MPI_STATUS_SIZE) • The TAG value is an integer that must be matched with the TAG value of the corresponding send. • The receiver can specify a wildcard value for TAG (MPI_ANY_TAG) indicating that it is willing to receive any tag value as long as the source and COMM values match.

  31. Source • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR, STATUS(MPI_STATUS_SIZE) • The STATUS parameter is a returned parameter that contains information about the completion of the message. • When using wildcards you may need to find out who it was that sent you a message, what it was about, and how long the message was before continuing to process. This is the type of information found in STATUS.

  32. Source • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) • IN <type> BUF(*) • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, • OUT IERROR, STATUS(MPI_STATUS_SIZE) • In FORTRAN77 STATUS is an array of integers of size MPI_STATUS_SIZE. • The three constants, MPI_SOURCE, MPI_TAG, and MPI_ERROR are the indices of the entries that store the source, tag and error fields respectively. • In C, STATUS is a structure of type MPI_Status that contains three fields named MPI_Source, MPI_Tag, and MPI_Error. • Notice that the length of the message doesn’t appear to be included…

  33. Questions/Answers • Question: What is the purpose of having the error returned in the STATUS data structure? It seems redundant. • Answer: It is possible for a single function such as MPI_WAIT_ALL( ) to complete multiple messages in a single call. In these cases each individual message may produce its own error code and that code is what is returned in the STATUS data structure.

  34. MPI_GET_COUNT • MPI_GET_COUNT(STATUS, DATATYPE, COUNT, IERROR) • IN INTEGER STATUS(MPI_STATUS_SIZE), DATA_TYPE, • OUT COUNT, IERROR • MPI_GET_COUNT will allow you to determine the number of entities of type DATATYPE were received in the message. • For advanced users see also MPI_GET_ELEMENT

  35. Six Powerful Functions • MPI_INIT • MPI_FINALIZE • MPI_COMM_RANK • MPI_COMM_SIZE • MPI_SEND • MPI_RECV

  36. Deadlock • MPI does not enforce a safe programming style. • It is the user’s responsibility to ensure that it is impossible for the program to fall into a deadlock condition. • Deadlock occurs when a process blocks to wait for an event that, given the current state of the system, can never happen.

  37. Deadlock examples • … • CALL MPI_COMM_RANK(comm, rank, ierr) • IF (rank .EQ. 0) THEN • CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) • CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) • ELSE IF (rank .EQ. 1) THEN • CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) • CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) • END IF • … This program will always deadlock.

  38. Deadlock examples • … • CALL MPI_COMM_RANK(comm, rank, ierr) • IF (rank .EQ. 0) THEN • CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) • CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) • ELSE IF (rank .EQ. 1) THEN • CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) • CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) • END IF • … This program is unsafe. Why?

  39. Safe Way • … • CALL MPI_COMM_RANK(comm, rank, ierr) • IF (rank .EQ. 0) THEN • CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) • CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) • ELSE IF (rank .EQ. 1) THEN • CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) • END IF • … This is a silly example…no one would ever try to do it the other ways…right?

  40. Motivating Example for Deadlock

  41. Motivating Example for Deadlock Timestep: 1

  42. Motivating Example for Deadlock Timestep: 2

  43. Motivating Example for Deadlock Timestep: 3

  44. Motivating Example for Deadlock Timestep: 4

  45. Motivating Example for Deadlock Timestep: 5

  46. Motivating Example for Deadlock Timestep: 6

  47. Motivating Example for Deadlock Timestep: 7

  48. Motivating Example for Deadlock Timestep: 8

  49. Motivating Example for Deadlock Timestep: 9

  50. Motivating Example for Deadlock Timestep: 10!

More Related