450 likes | 601 Views
MPI Programming Hamid Reza Tajozzakerin Sharif University of technology. Introduction. Massage-Passing interface (MPI) A library of functions and macros
E N D
MPI ProgrammingHamid Reza TajozzakerinSharif University of technology
Introduction • Massage-Passing interface (MPI) • A library of functions and macros • Objectives: define an international long-term standard API for portable parallel applications and get all hardware vendors involved in implementations of this standard; define a target system for parallelizing compilers • Can be used in C,C++,FORTRAN • The MPI Forum (http://www.mpi-forum.org/) brings together all contributing parties
Programming with MPI General MPI Programs • Include the lib file mpi.h (or however called) into the source code • Initialize the MPI environment: • MPI_Init (&argc, &argv) • Must be called and only once before any other MPI functions • At the end of the program: • MPI_Finalize( ); • Cleans up any unfinished business left by MPI
Programming with MPI (cont.) • Get your own process ID (rank): • MPI_Comm_rank (MPI_Comm comm, int rank) • First argument is a communicator • Communicator: a collection of processes send message to each other • Get the number of processes (including oneself): • MPI_Comm_size (MPI_comm comm, int size) • Size: number of processes in comm
What is message? • Message: Data + Envelope • Envelope: • Additional information to message be communicated successfully • Envelop contains: • Rank of sender (who send the message) • Can be a wildcard: MPI_ANY_SOURCE • Rank of receiver (who received the message) • No wildcard for dest • A tag: • used to distinguish messages received from a single process • Can be a wildcard: MPI_ANY_TAG • Communicator
Point-to-Point Communication • a send command can be • Blocking: continuation possible after passing to communication system has been completed (buffer can be re-used) • non-blocking: immediate continuation possible (check buffer whether message has been sent and buffer can be re-used)
Point-to-Point Communication(Cont.) • Four types of point-to-point send operations, each of them available in a blocking and a non-blocking variant • Standard (regular) send: MPI_SEND or MPI_ISEND • Asynchronous; the system decides whether or not to buffer messages to be sent • Successful completion may depend on matching receive • Buffered send: MPI_BSEND or MPI_IBSEND • Asynchronous, but buffering of messages to be sent by the system is enforced • Synchronous send: MPI_SSEND or MPI_ISSEND • Synchronous, i.e. the send operation is not completed before the receiver has started to receive the message
Point-to-Point Communication(Cont.) • Ready send: MPI_RSEND or MPI_IRSEND • Send may started only if matching receive has been posted: if no corresponding receive operation is available, the result is undefined • Could be replaced by standard send with no effect other than performance • Meaning of blocking or non-blocking (variants with ‘I’): • Blocking: send operation is not completed before the send buffer can be reused • Non-blocking: immediate continuation, and the user has to make sure that the buffer won’t be corrupted
Point-to-Point Communication(cont.) • one receive function: • Blocking MPI_Recv : • Receive operation is completed when the message has been completely written into the receive buffer • Non-blocking MPI_IRecv : • Continuation immediately after the receiving has begun • Can be combined with four send modes
Point-to-Point Communication(Cont.) • Syntax: • MPI_SEND(buf, count, datatype, dest, tag, comm) • MPI_RECV(buf, count, datatype, source, tag, comm, status) • where • Void *buf pointer to the buffer’s begin • int count number of data objects • int source process ID of the sending process • int dest process ID of the destination process • int tag ID of the message • MPI_Datatype data type of the data objects • MPI_Comm comm communicator (see later) • MPI_Status *status object containing message information • In the non-blocking versions, there’s one additional argument complete (request) for checking the completion of the communication.
Test Message Arrived • MPI_Buffer_attach(...): • lets MPI provide a buffer • MPI_Probe(...)/ MPI_Iprobe(...): • Blocking/ non-blocking test whether a message has arrived without actually receive them • MPI_Test(...): • checks whether a send or receive operation is completed • MPI_Wait(...): • causes the process to wait until a send or receive operation has been completed • MPI_Get_count(...): • provides the length of a message received
Data Types • Standard MPI data types: • MPI_CHAR • MPI_SHORT • MPI_INT • MPI_LONG • MPI_UNSIGNED • MPI_FLOAT • MPI_DOUBLE • MPI_LONG_DOUBLE • MPI_BYTE(8-binary digit) • MPI_PACKED
Grouping Data • Why? • The fewer messages sent, better overall performance • Three mechanisms: • Count Parameter: • group data having the same basic type as an array • Derived Types • Pack/Unpack
Building Derived Types • Specify types of members of the derived type • Number of elements of each type • Calculate addresses of members • Calculate displacements: Relative location • Create the derived type • MPI_Type_Struct(…) • Commit it • MPI_Type_commit(…)
Other Derived Data type constructors • MPI_Type_contiguous(...): • Constructs an array • consisting of count elements of type old type belong to contiguous memory • MPI_Type_vector(...): • constructs an MPI array with element-to-element distance stride • MPI_Type_ indexed(...): • constructs an MPI array with different block lenghts
Packing and Unpacking • Elements of a complex data structure can be packed, sent, and unpacked again element by element: expensive and error-prone • Pack: store noncontiguous data in contiguous memory location • Unpack: copy data from a contiguous buffer into noncontiguous memory locations • MPI functions for explicit packing and unpacking: • MPI_Pack(...): • Packs data into a buffer • MPI_Unpack(...): • unpacks data from the buffer
Collective Communication • Why? • Many applications require not only a point-to-point communication, but also collective communication operations • Collective communication: • Broadcast • Gather • Scatter • All-to-All • Reduce
Collective Communication (Cont.) • Important application scenario: • distribute the elements of vectors or matrices among several processors • Some functions offered by MPI • MPI_Barrier(...): • synchronization barrier: process waits for the other group members; when all of them have reached the barrier, they can continue • MPI_Bcast(...): • sends the data to all members of the group given by a communicator (hence more a multicast than a broadcast) • MPI_Gather(...): • collects data from the group members
Collective Communication(Cont.) • MPI_Allgather(...): • gather-to-all: data are collected from all processes, and all get the collection • MPI_Scatter(...): • classical scatter operation: distribution of data among processes • MPI_Reduce(...): • executes a reduce operation • MPI_Allreduce(...): • executes a reduce operation where all processes get its result • MPI_Op_create(...) and MPI_Op_free(...): • defines a new reduce operation or removes it, respectively • Note that all of the functions above are with respect to a communicator (hence not necessarily a global communication)
Process Groups and Communicators • Messages are tagged for identification – message tag is message ID! • Again: process groups for restricted message exchange and restricted collective communication • Process groups are ordered sets of processes • Each process is locally uniquely identified via its local (group-related) process ID or rank • Ordering starts with zero, successive numbering • Global identification of a process via the pair (process group, rank)
Process Groups and Communicators • MPI communicators: concept for working with contexts • Communicator = process group + message context • MPI offers intra-communicators for collective communication within a process group and inter-communicators for (point-to-point) communication between two process groups • Default (including all processes): MPI_COMM_WORLD • MPI provides a lot of functions for working with process groups and communicators
Working with communicator • To create new communicator • Make a list of the processes in new communicator • Get a group of processor in the list • MPI_Comm_Group(…) • Create new group • MPI_Group_incl(…) • Create actual communicator • MPI_Comm_create(…) • Note: To create several communicator simultaneously • MPI_Comm_split(…)
Process Topologies • Provide a convenient naming mechanism for processes of a group • Assist the runtime system in mapping onto hardware • Only for intra-communicator • virtual topology: • Set of process represented by a graph • Most common topologies: mesh ,tori
Some useful functions • MPI_Comm_rank(…) • Indicates rank of the process call it • MPI_Comm_size • Returns size of the group • MPI_Comm_dup(..) • Cerates a new communicator with the same attributes of input communicator • MPI_Comm_free(MPI_Comm *comm) • set the handle to MPI_COMM_NULL
An example of Cartesian graphUpper number is ranklower pair is (row,col) coordinates
Cartesian Topology Functions • MPI_Cart_create(…) • Returns a handle to a new communicator to which the Cartesian topology information is attached • MPI_Dimes_create(…) • To select a balanced distribution of process • MPI_Cartdim_get(…) • Returns numbers of dimensions • MPI_Cart_get(…) • Returns information on topology • MPI_Cart_sub(…) • Partition Cartesian topology into a Cartesian of lower dimension • MPI_Cart_coords(..), MPI_Cart_rank(…)
Preliminary • DCT: Discrete Cosine Transform • 2D DCT: applied a 1D DCT twice • 2D-DCT Equation • X: N*N Matrix • C: N*N matrix defined as: • Y contains DCT coefficients • Main operation is matrix mult
FOX’s Algorithm • Multiply two square matrices • Assume two matrices: A = (aij) and B = (bij) • Matrices are from order n • Assume number of processes are p: perfect square so: p=q2 • n_bar = n/q: an integer • Each process has a block of A and B as a matrices from order n/q
FOX’s Algorithm (Cont.) • For example: p=9 and n=6
FOX’s Algorithm (Cont.) • The chosen submatrix in the r’th row is Ar,u where u= (r+step) mode q • Example: at step=0 these multiplication done • r=0: A00B00,A00B01,A00B02 • r=1:A11B10,A11B11,A11B12 • r=1:A22B20,A22B21,A22B22 • Other mults done in other steps • Processes communicate to each other so the mult of two matrices results
Implementation of algorithm • Assume each row of processes as a communicator • Assume each column of processes as a communicator • MPI_Cart_sub(Com, var_coor, row_com); • MPI_Cart_sub(grid->Com, var_coor,col_com)); • Can use other functions: (more general communicator cunstruction functions) • MPI_Comm_incl(com,q,rank,row_comm) • MPI_Comm_create(comm,row_com,&row_com)
Implementation of MPI • An MPI implementation consists of • a subroutine library with all MPI functions • include files for the calling application program • some startup script (usually called mpirun, but not standardized) • MPICH • Support both operating systems: linux and Microsaft Windows • Other implementation of MPI: Many different MPI implementation are available i.e: • LAM • Support MPI programming on networks of unix workstation • See other implementation and their features: • http://www.lam-mpi.org/mpi/implementations/fulllist.php
Implementation of MPI (Cont.) • IMPI: Interoperable MPI • A protocol specification to allow multiple MPI implementations to cooperate on a single MPI job. • Any correct MPI program will run correctly under IMPI • Divided into four parts: • Startup/shutdown protocols • Data transfer protocol • Collective algorithm • A centralized IMPI conformance testing methodology
Extensions to MPI • External Interfaces • One-sided Communication • Dynamic Resource Management • Extended Collective • Bindings • Real Time • Some of these features are still subject to change