240 likes | 277 Views
Introduction to MPI Programming Ganesh C.N. Introduction. MPI – Message Passing Interface. A programming paradigm used widely in parallel computers (specially scalable ones) Standard and portable message passing system which defines the syntax and semantics of a set of core libraries.
E N D
Introduction • MPI – Message Passing Interface. • A programming paradigm used widely in parallel computers (specially scalable ones) • Standard and portable message passing system which defines the syntax and semantics of a set of core libraries. • Allows bindings for Fortran 77 and C. • Goal – Allow efficient communication avoiding memory to memory copying and allow overlap of computations and communication.
Introduction • What is included in MPI - Point to point communication - Communication domains - Processor topologies • What is not included in MPI - Shared memory operations - Interrupt driven message passing - Debugging tools - I/O support
General MPI Programs …. • Every MPI Program must contain the preprocessor directive #include “mpi.h” • The file “mpi.h” contains all definitions, macros and function prototypes needed for compiling an MPI program. • Typical layout of an MPI program: ……… #include “mpi.h” main(int argc, char *argv[]) { …… /* No MPI functions called before this */ MPI_Init(&argc, &argv); ……… MPI_Finalize(); /* No MPI functions after this */ }
MPI_Init & MPI_Finalize …. • MPI_Init should be called before any other MPI function. Should be called only once. • Arguments – pointers to argc and argv. • Allows system to do startup so that MPI libraries can be used. • It is a directive to the OS which places a copy of the executable on each processor.
MPI_Init & MPI_Finalize …. • Each processor begins execution of its copy of the executable. • Different statements can be executed by branching within the program. Typically based on rank. • After all MPI functions, MPI_Finalize must be called. This cleans up any unfinished activities left by MPI.
Programming Paradigm …. • The paradigm used by MPI is the SPMD model of parallel programming. • The same program is loaded in all the processors and each processor runs its copy. • But we can obtain the effect of different programs running on different processors by executing branches based on process rank. • Most common method of writing MIMD programs.
MPI_Comm_rank and MPI_Comm_size • MPI_Comm_rank returns the rank of the process. • Syntax : MPI_Comm_rank(MPI_Comm comm, int *rank) • The first argument is a communicator. It is a collection of processes that can send messages to each other. • For basic programs, the communicator is MPI_COMM_WORLD. Predefined in MPI and consists of all the processes running. • MPI_Comm_size returns the number of processes. • Syntax: MPI_Comm_size(MPI_Comm comm, int *size);
Point to point communication • The actual message passing is carried out by MPI_Send and MPI_Recv. • System must append some information called envelope. • Consists of the ranks of sender and receiver, a tag and the communicator. • Syntax: int MPI_Send(void *msg, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm);
MPI_Send (contd…) • The message to be passed is stored in msg. • count and type allow the system to identify the end of message. • type is NOT a C type. But they correspond to standard C types. There are predefined MPI data types such as MPI_INT, MPI_CHAR, MPI_LONG, MPI_FLOAT, MPI_DOUBLE etc. • Receive buffer size and send buffer size need not be equal.
MPI_Recv int MPI_Recv(void *msg, int count, MPI_Datatype type, int src, int tag, MPI_Comm comm, MPI_Status *status); • src can be a ‘wildcard’ called MPI_ANY_SOURCE. Can receive message from any sending process. • No such wildcard for destination. • The communicator used in MPI_Send and MPI_Recv must be the same. • status returns information on the data that was received. It refers to a record with two fields – source and the tag.
Process synchronization - Barrier • Barrier – for synchronizing all processes. Each process blocks until every process reaches the same point. • Syntax: int MPI_Barrier(MPI_Comm comm);
Collective Communication - Broadcast • Involves all the processes in a communicator. • Broadcast – a single process sends the same data to every process. • Syntax: int MPI_Bcast(void *msg, int count, MPI_Datatype type, int root, MPI_Comm comm); • Should be called by all processes with same arguments for root and comm. • A broadcast message CANNOT be received by calling MPI_Recv.
Collective Communication – Scatter • Scatter – Distributes a block of memory among processes. • Syntax: MPI_Scatter(void *send_buf, int send_count, MPI_Datatype send_type, void *recv_buf, int recv_count, MPI_Datatype recv_type, int root, MPI_Comm comm); • Contents of send_buf are split into n segments each consisting of send_count items and sent to all processes in rank order. The send arguments are significant only in the root process.
data data processes processes Collective Communication - Gather • Gather – each process sends a block of data. The root process concatenates the data in rank order. • Syntax: MPI_Gather(void *send_buf, int send_count, MPI_Datatype send_type, void *recv_buf, int recv_count, MPI_Datatype recv_type, int root, MPI_Comm comm); scatter a0 a1 a2 a0 a1 gather a2
Collective Comm. – vector variants • Varying count of data from each process • Can also specify displacements in root • MPI_Gatherv ( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int *recvcnts, int *displs, MPI_Datatype recvtype, int root, MPI_Comm comm ) • MPI_Scatterv (void *sendbuf, int *sendcnts, int *displs, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm )
Collective comm. – Allgather / Alltoall • MPI_Allgather(void *send_buf, int send_count, MPI_Datatype send_type, void *recv_buf, int recv_count, MPI_Datatype recv_type, MPI_Comm comm); • Can be thought of as Gather except that all processes receive the result (instead of just the root) • MPI_Alltoall (void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm) • Like Allgather except that each process sends distinct data to each of the receivers • The jth block sent from process i received by process j and placed in the ith block of recvbuf
data data data data processes processes processes processes Collective comm. – Allgather / Alltoall a0 a0 b0 c0 allgather a0 b0 b0 c0 a0 c0 b0 c0 a0 a0 a1 a2 b0 c0 alltoall a1 b0 b1 b2 b1 c1 a2 c0 c1 c2 b2 c2
Collective Communication – Reduce • Reduce – all the processes contribute data which is combined using a binary operation. • Typical binary ops – addition, max, min, and etc. • Syntax: MPI_Reduce(void *operand, void *result, int count, MPI_Datatype type, MPI_Op op, int root, MPI_Comm comm); • op – binary operator. Eg: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND etc. • All processes must call MPI_Reduce with the same arguments.
Other reduce operations • MPI_Allreduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm) • Same as MPI_Reduce except the result appears in the recvbuf of all processes in the group. • Other reduction operations • MPI_Reduce_scatter • MPI_Scan
Derived datatypes • MPI can handle user defined datatypes also • Type constructor functions – describe member types and their relative locations in memory • MPI_Type_struct (int count, int *blocklengths, MPI_Aint *displacements, MPI_Datatype *types, MPI_Datatype *newtype) • Must always call MPI_Type_commitbefore using a constructed datatype.
Derived datatypes – pack/unpack • Explicitly store non-contiguous data into a contiguous buffer for transmission, then unpack it at the other end. • When sending/receiving packed messages, must use MPI_PACKED datatype in send/receive calls. • MPI_Pack (void *inbuf, int incount, MPI_Datatype datatype, void *outbuf, int outsize, int *position, MPI_Comm comm) • MPI_Unpack (void *inbuf, int insize, int *position, void *outbuf, int outcount, MPI_Datatype datatype, MPI_Comm comm)
Subcommunicators • Mechanism for treating a subset of processes as a collection • Consists of : • Group – ordered collection of processes • Context – a tag attached to a group • MPI_Group_incl ( MPI_Group group, int n, int *ranks, MPI_Group *group_out) • MPI_Comm_create ( MPI_Comm comm, MPI_Group group, MPI_Comm *comm_out )
Topologies • Linear process ranks don’t reflect logical communication pattern of processes • Optional mapping mechanism between processes and hardware • Grid and graph topologies supported • MPI_Cart_create (MPI_Comm comm_old, int dims, int *num_proc_dims, int *circular, int reorder, MPI_Comm *comm_cart )