680 likes | 783 Views
Parallel Computing Message Passing Interface. EE8603 - SPECIAL TOPICS ON PARALLEL COMPUTING Professor: Nagi Mekhiel Presented by: Leili, Sanaz, Mojgan, Reza. Outline. Overview Messages and Point-to-Point Communication Non-blocking communication Collective communication
E N D
Parallel ComputingMessage Passing Interface EE8603 - SPECIAL TOPICS ON PARALLEL COMPUTING Professor: Nagi Mekhiel Presented by: Leili, Sanaz, Mojgan, Reza
Outline • Overview • Messages and Point-to-Point Communication • Non-blocking communication • Collective communication • Derived data types • Other MPI-1 features • Installing and Utilizing MPI • Experimental results • Conclusion • References January 4, 2020 Ryerson University Ryerson University 2
Overview Other Servers Web Server Mail Server Workstation Head Node Compute Node MPI Interconnect Ring Private Network Compute Node Compute Node Compute Node January 4, 2020 Ryerson University Ryerson University 3
Overview Parallel Computing • A task is broken down into tasks, performed by separate workers or processes • Processes interact by exchanging information • What do we basically need? • The ability to start the tasks • A way for them to communicate Ryerson University
Overview What is MPI? • A message passing library specification • Message-passing model • Not a compiler specification (i.e. not a language) • Not a specific product • Designed for parallel computers, clusters, and heterogeneous networks Ryerson University
Overview beep ok Synchronous Communication • A synchronous communication does not complete until the message has been received. • A FAX or registered mail. Ryerson University
Overview Asynchronous Communication • An asynchronous communication completes as soon as the message is on the way. • A post card or email. Ryerson University
Overview Collective Communications • Point-to-point communications involve pairs of processes. • Many message passing systems provide operations which allow larger numbers of processes to participate Types of Collective Transfers • Barrier • Synchronizes processors. • No data is exchanged but the barrier blocks until all processes. have called the barrier routine. • Broadcast (sometimes multicast) • A broadcast is a one-to-many communication. • One processor sends one message to several destinations. • Reduction • Often useful in a many-to-one communication. Ryerson University
Overview What’s in a Message? • An MPI message is an array of elements of a particular MPI datatype. • All MPI messages are typed • The type of the contents must be specified in both the send and the receive. Basic C Datatypes in MPI Ryerson University
Overview • MPI Handles • MPI maintains internal data-structures which are referenced by the user through handles. • Handles can be returned by and passed to MPI procedures. • Handles can be copied by the usual assignment operation. • MPI Errors • MPI routines return an int that can contain an error code. • The default action on the detection of an error is to cause the parallel operation to abort. • The default can be changed to return an error code. Ryerson University
Overview Initializing MPI • The first MPI routine called in any MPI program must be the initialization routine MPI_INIT. • MPI_INIT is called once by every process, before any other MPI routines. int mpi_Init( int *argc, char **argv ); Ryerson University
Overview Skeleton MPI Program #include <mpi.h> main( int argc, char** argv ) { MPI_Init( &argc, &argv ); /* main part of the program */ MPI_Finalize(); } Ryerson University
Overview Point-to-point Communication • Always involves exactly two processes • The destination is identified by its rank within the communicator • There are four communication modes provided by MPI (these modes refer to sending not receiving) • Standard • Synchronous • Buffered • Ready Ryerson University
Overview Standard Send MPI_Send( buf, count, datatype, dest, tag, comm ) Where • buf is the address of the data to be sent • count is the number of elements of the MPI datatype which buf contains • datatype is the MPI datatype • dest is the destination process for the message. This is specified by the rank of the destination within the group associated with the communicator comm • tag is a marker used by the sender to distinguish between different types of messages • comm is the communicator shared by the sender and the receiver Ryerson University
Overview Synchronous Send MPI_Ssend( buf, count, datatype, dest, tag, comm ) • can be started whether or not a matching receive was posted • will complete successfully only if a matching receive is posted, and the receive operation has started to receive the message sent by the synchronous send. • provides synchronous communication semantics: a communication does not complete at either end before both processes rendezvous at the communication. • has non-local completion semantics. Ryerson University
Overview Buffered Send • A buffered-mode send • Can be started whether or not a matching receive has been posted. It may complete before a matching receive is posted. • Has local completion semantics: its completion does not depend on the occurrence of a matching receive. • In order to complete the operation, it may be necessary to buffer the outgoing message locally. For that purpose, buffer space is provided by the application. Ryerson University
Overview Ready Mode Send • A ready-mode send • completes immediately • may be started only if the matching receive has already been posted. • has the same semantics as a standard-mode send. • saves on overhead by avoiding handshaking and buffering Ryerson University
Overview • Messages and Point-to-Point Communication • Non-blocking communication • Collective communication • Derived data types • Other MPI-1 features • Installing and Utilizing MPI • Experimental results • Conclusion • References Ryerson University
Point-to-Point Communication communicator 2 0 message 1 5 3 4 6 • Point-to-Point Communication • Communication between two processes. • Source process sends message to destination process. • Communication takes place within a communicator, e.g., MPI_COMM_WORLD. • Processes are identified by their ranks in the communicator. destination source January 4, 2020 Ryerson University Ryerson University 19
Point-to-Point Communication For a communication to succeed: Sender must specify a valid destination rank. Receiver must specify a valid source rank. The communicator must be the same. Tags must match. Message datatypes must match. Receiver’s buffer must be large enough. January 4, 2020 Ryerson University Ryerson University 20
Point-to-Point Communication Communication Modes Send communication modes: synchronous send MPI_SSEND buffered [asynchronous] send MPI_BSEND standard send MPI_SEND Ready send MPI_RSEND Receiving all modes MPI_RECV January 4, 2020 Ryerson University Ryerson University 21
Point-to-Point Communication Communication Modes — Definitions January 4, 2020 Ryerson University Ryerson University 22
Point-to-Point Communication Message Order Preservation Rule for messages on the same connection, i.e., same communicator, source, and destination rank. Messages do not overtake each other. This is true even for non-synchronous sends. If both receives match both messages, then the order is preserved. 2 0 1 5 3 4 6 January 4, 2020 Ryerson University Ryerson University 23
Overview • Messages and Point-to-Point Communication • Non-blocking communication • Collective communication • Derived data types • Other MPI-1 features • Installing and Utilizing MPI • Experimental results • Conclusion • References January 4, 2020 Ryerson University Ryerson University 24
Non-Blocking Communication Meaning of Blocking and Non-Blocking: • Blocking: the program will not return from the subroutine call until the copy to/from the system buffer has finished. • Non-blocking: the program immediately returns from the subroutine call. It is not assured that the copy to/from the system buffer has completed so that user has to make sure of the completion of the copy. Ryerson University
Non-Blocking Communication Characteristics: Separate communication into three phases: Initiate non-blocking communication returns Immediately routine name starting with MPI_I… Do some work “latency hiding” Wait for non-blocking communication to complete January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 26 26
Non-Blocking Communication MPI_Isend(...) MPI_Wait(...) MPI_Wait(...) doing some other work MPI_Irecv(...) doing some other work Non-Blocking Examples • Non-Blocking Send • Non-Blocking receive = waiting until operation locally completed January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 27 27
Non-Blocking Communication Non-blocking Synchronous Send: C: MPI_Issend (buf, count, datatype, dest, tag, comm, OUT &request_handle); MPI_Wait (INOUT &request_handle, &status); Fortran: CALL MPI_ISSEND(buf, count, datatype, dest, tag, comm, OUT request_handle, ierror) CALL MPI_WAIT(INOUT request_handle, status, ierror) buf must not be used between Issend and Wait. (in all progr. languages). “Issend + Wait directly after Issend” is equivalent to blocking call. (Ssend) status is not used in Issend, but in Wait. (with send: nothing returned) January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 28 28
Non-Blocking Communications Non-blocking Receive: C: MPI_Irecv (buf, count, datatype, source, tag, comm, OUT &request_handle); MPI_Wait (INOUT &request_handle,&status); Fortran: CALL MPI_IRECV (buf, count, datatype, source, tag, comm, OUT request_handle, ierror) CALL MPI_WAIT( INOUT request_handle, status, ierror) buf must not be used between Irecv and Wait (in all progr. languages) January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 29 29
Overview • Messages and Point-to-Point Communication • Non-blocking communication • Collective communication • Derived data types • Other MPI-1 features • Installing and Utilizing MPI • Experimental results • Conclusion • References January 4, 2020 Ryerson University Ryerson University 30
Collective Communication Characteristic: Communications exist in a group of processes. Must be called by all processes in a communicator. Synchronization may or may not occur. All collective operations are blocking. Receive buffers must have exactly the same size as send buffers. January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 31 31
Collective Communication Barrier Synchronization: process waits for the other group members; when all of them have reached the barrier, they can continue C: int MPI_Barrier(MPI_Comm comm); Fortran: MPI_BARRIER(COMM, IERROR)INTEGER COMM, IERROR all synchronization is done automatically by the data communication: a process cannot continue before it has the data that it needs. January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 32 32
Collective Communication • Broadcast: • sends the data to all members of the group given by a communicator. • C: int MPI_Bcast(void *buf, int count, MPI_Datatype datatype,int root, MPI_Comm comm); • Example: the first process that finds the solution in a competition informs everyone to stop January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 33 33
Collective Communication • Gather: • collect information from all participating • Processes • C: int MPI_Gather( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype,int root, MPI_Comm omm); • Example: each process computes some part of the solution, which shall now be assembled by one process 20 20 20 20 January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 34 34
Collective Communication • Scatter: • Distribution of data among processes. • C: int MPI_Scatter( void *sendbuf, int sendcount, MPI_Datatype sendtype,void *recvbuf, int recvcount,MPI_Datatype recvtype, int root, MPI_Comm comm); Example: two vectors are distributed in order to prepare a parallel computation of their scalar product January 4, 2020 January 4, 2020 Ryerson University Ryerson University Ryerson University 35 35
Overview • Messages and Point-to-Point Communication • Non-blocking communication • Collective communication • Derived data types • Other MPI-1 features • Installing and Utilizing MPI • Experimental results • Conclusion • References January 4, 2020 Ryerson University Ryerson University 36
Derived Data Types MPI Data types • Description of the memory layout of the buffer • for sending • for receiving • Basic types • Derived types • Vectors, structs, others • Built from existing data types January 4, 2020 Ryerson University Ryerson University 37
Derived Data Types struct buff_layout {int i_val[3]; double d_val[5]; } buffer; array_of_types[0]=MPI_INT; array_of_blocklengths[0]=3; array_of_displacements[0]=0; array_of_types[1]=MPI_DOUBLE; array_of_blocklengths[1]=5; array_of_displacements[1]=…; MPI_Type_struct(2, array_of_blocklengths, array_of_displacements, array_of_types, &buff_datatype); MPI_Type_commit(&buff_datatype); int double Compiler MPI_Send(&buffer, 1, buff_datatype, …) Data Layout and the Describing Datatype Handle the datatype handle describes the data layout &buffer = the start address of the data January 4, 2020 Ryerson University Ryerson University 38
Derived Data Types Type Maps • A derived data type is logically a pointer to a list of entries: • basic data type at displacement January 4, 2020 Ryerson University Ryerson University 39
Derived Data Types 0 4 8 12 16 20 24 basic datatype displacement 11 22 6.36324d+107 MPI_CHAR 0 A derived data type describes the memory layout of, e.g., structures, common blocks, subarrays, some variables in the memory MPI_INT 4 MPI_INT 8 MPI_DOUBLE 16 c Example: derived data type handle January 4, 2020 Ryerson University Ryerson University 40
Derived Data Types oldtype newtype Contiguous Data • The simplest derived data type • Consists of a number of contiguous items of the same data type • C: int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype) • Fortran: MPI_TYPE_CONTIGUOUS( COUNT, OLDTYPE, NEWTYPE, IERROR) INTEGER COUNT, OLDTYPE INTEGER NEWTYPE, IERROR January 4, 2020 Ryerson University Ryerson University 41
Derived Data Types oldtype holes, that should not be transferred newtype blocklength = 3 elements per block stride = 5 (element stride between blocks) count = 2 blocks Vector Datatype • C: int MPI_Type_vector(int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype) • Fortran: MPI_TYPE_VECTOR(COUNT, BLOCKLENGTH, STRIDE, OLDTYPE, NEWTYPE, IERROR) INTEGER COUNT, BLOCKLENGTH, STRIDE INTEGER OLDTYPE, NEWTYPE, IERROR January 4, 2020 Ryerson University Ryerson University 42
Derived Data Types MPI_TYPE_VECTOR: An example • Sending the first row of a N*M Matrix • C • Fortran • Sending the first column of an N*M Matrix • C • Fortran January 4, 2020 Ryerson University Ryerson University 43
Derived Data Types Sending a row using MPI_TYPE_vector • C • MPI_Type_vector(1, 5, 1, MPI_INT, MPI_ROW) • Fortran • MPI_Type_vector(1, 5, 1, MPI_INT, MPI_ROW) • MPI_Type_Commit(MPI_ROW) • MPI_Send(&buf …, MPI_ROW…) • MPI_Recv(&buf …, MPI_ROW…) January 4, 2020 Ryerson University Ryerson University 44
Derived Data Types Sending a column using MPI_TYPE_vector • C • MPI_Type_vector(4, 1, 5, MPI_INT, MPI_COL) • Fortran • MPI_Type_vector(4, 1, 5, MPI_INT, MPI_COL) • MPI_Type_Commit(MPI_COL) • MPI_Send(buf …, MPI_COL…) • MPI_Recv(buf …, MPI_COL…) January 4, 2020 Ryerson University Ryerson University 45
Derived Data Types Sending a sub-matrix using MPI_TYPE_vector • C • MPI_Type_vector(2, 3, 5, MPI_INT, MPI_SUBMAT) • Fortran • MPI_Type_vector(2, 3, 5, MPI_INT, MPI_SUBMAT) • MPI_Type_Commit(MPI_SUBMAT) • MPI_Send(&buf …, MPI_SUBMAT…) • MPI_Recv(&buf …, MPI_SUBMAT…) January 4, 2020 Ryerson University Ryerson University 46
Overview • Messages and Point-to-Point Communication • Non-blocking communication • Collective communication • Derived data types • Other MPI-1 features • Installing and Utilizing MPI • Experimental results • Conclusion • References January 4, 2020 Ryerson University Ryerson University 47
Other MPI features (1) A B C A B C A B C A B C A1 B1 C1 A2 B2 C2 A3 B3 C3 A1 A2 A3 B1 B2 B3 C1 C2 C3 A1 B1 C1 A2 B2 C2 A3 B3 C3 A B C • Point-to-point • MPI_Sendrecv & MPI_Sendrecv_replace • Null processes, MPI_PROC_NULL (see Chap. 7??, slide on MPI_Cart_shift) • MPI_Pack & MPI_Unpack • MPI_Probe: check length (tag, source rank) before calling MPI_Recv • MPI_Iprobe: check whether a message is available • MPI_Request_free, MPI_Cancel • Persistent requests • MPI_BOTTOM (in point-to-point and collective communication) • Collective Operations • MPI_Allgather MPI_Alltoall MPI_Reduce_scatter • MPI_.......…v (Gatherv, Scatterv, Allgatherv, Alltoallv) • Topologies • MPI_DIMS_CREATE January 4, 2020 Ryerson University Ryerson University 48
Other MPI features (2) Error Handling • the communication should be reliable • if the MPI program is erroneous: • by default: abort, if error detected by MPI library otherwise, unpredictable behavior • Fortran: call MPI_Errhandler_set ( comm, MPI_ERRORS_RETURN, ierr) • C: MPI_Errhandler_set ( comm, MPI_ERRORS_RETURN);then • ierror returned by each MPI routine • undefined state after an erroneous MPI call has occurred(only MPI_ABORT(…) should be still callable) January 4, 2020 Ryerson University Ryerson University 49
Overview • Messages and Point-to-Point Communication • Non-blocking communication • Collective communication • Derived data types • Other MPI-1 features • Installing and Utilizing MPI • Experimental results • Conclusion • References January 4, 2020 Ryerson University Ryerson University 50