440 likes | 564 Views
MPI Message Passing Interface. Yvon Kermarrec. More readings. “Parallel programming with MPI”, Peter Pacheco, Morgan Kaufmann Publishers LAM/MPI User Guide: http://www.lam-mpi.org/tutorials/lam/ The MPI standard is available from http://www.mpi-forum.org/. Agenda.
E N D
MPI Message Passing Interface YvonKermarrec
More readings • “Parallel programming with MPI”, Peter Pacheco, Morgan Kaufmann Publishers • LAM/MPI User Guide: http://www.lam-mpi.org/tutorials/lam/ • The MPI standard is available from http://www.mpi-forum.org/
Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Serial Computing • 1k pieces puzzle • Takes 10 hours
Parallelism on Shared Memory • Orange and brown share the puzzle on the same table • Takes 6 hours(not 5 due to communication & contention)
The more, the better?? • Lack of seats (Resource limit) • More contention among people
Parallelism on Distributed Systems • Scalable seats (Scalable Resource) • Less contention from private memory spaces
How to share the puzzle? • DSM (Distributed Shared Memory) • Message Passing
DSM (Distributed Shared Memory) • Provides shared memory physically or virtually • Pros - Easy to use • Cons - Limited Scalability, High coherence overhead
Message Passing • Pros – Scalable, Flexible • Cons – Someone says it’s more difficult than DSM
Agenda • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Weneed more computational power • The weatherforcastexample by P Pacheco: • Suppose wewish to predict the weather over the United and Canada for the next 48 hours • Also suppose thatwewant to model the atmospherefromsealevel to an altitude of 20 km • we use a cubicalgrid, witheach cube measuring 0.1 km to model the atmosphere ,or 2.0 x 107 km2 x 20 km x 103 cubes per km3 = 4 x 1011grid points • Suppose weneed to computer 100 instructions for each points for the next 48 hours : weneed4 x 1013 x 48 operations • If our computer executes109 ope/sec, weneed 23 days
The need for parallelprogramming • We face numerous challenges in science (biology, simulation, earthquakes, …) and wecannotbuildfastenough computers…. • Data canbebig (big data…) and memoryisratherlimited • Processors can do a lot ... But to adress figures as mentionnedwecan program smarter but thatis not enough
The need for parallel machines • Wecanbuild a parallel machines, but thereisstill a hugeamount of work to bedone: • decide on and implement an interconnection network for the processors and memory modules, • design and implement system software for the hardware • Design algorithms and data structures to solve our problem • Divide the algorithms and data structures into subproblems • Indentify the communications and data exchanges • Assign subproblems to processors
The need for parallel machines • Flynn’staxonomy (or how to work more!) • SISD : Single Instruction – Single Data : the common and classical machine… • SIMD : Single Instruction – Multiple data : the same instructions are carried out simultaneously on multiple data items • MIMD : Multiple Instructions – Multiple Data • SPMD : Single Program – Multiple Data : the same version of the program isreplicated and run on different data
The need for parallel machines • Wecanbuild one parallel computer … but thatwouldbeveryexpensive, time and energyconsuming, … and hard to maintain • Wemaywant to integratewhatisavailable in the labs – to agregate the availablecomputing ressources and reuseordinary machines : • US D.oEnergy and the PVM project (Parallel Virtual Machine) from ‘89
MPI : Message Passing Interface ? • MPI : an Interface • A message-passing libraryspecification • extended message-passing model • not a language or compiler specification • not a specific implementation or product • For parallel computers, clusters, and heterogeneous networks • A riche set of features • Designed to provide access to advanced parallel hardware for end users, library writers, and tooldevelopers
MPI ? • An international product • Earlyvendorsystems (Intel’s NX, IBM’s EUI, TMC’s CMMD) were not portable • Early portable systems (PVM, p4, TCGMSG, Chameleon) were mainly research efforts • Were rather limited… and lackedvendor support • Were not implemented at the most efficient level • The MPI Forum organized in 1992 with broad participation by: • vendors: IBM, Intel, TMC, SGI, Convex … • users: application scientists and library writers
How big is the MPI library? • Huge ( 125 Functions )… • Basic ( 6 Functions ) • But only a subset is needed to program a distributed application
Environments for parallelprogramming • Upshot, Jumpshot, and MPE tools • http://www.mcs.anl.gov/research/projects/perfvis/software/viewers/ • • Pallas VAMPIR • http://www.vampir.eu/ • • Paragraph • http://www.ncsa.uiuc.edu/Apps/MCS/ParaGraph/ParaGraph.html
A Minimal MPI Program in C #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello, world!\n" ); MPI_Finalize(); return 0; }
Finding Out About the Environment • Two important questions that arise early in a parallel program are: • How many processes are participating in this computation? • Which one am I? • MPI provides functions to answer these questions: • MPI_Comm_size reports the number of processes. • MPI_Comm_rank reports the rank, a number between 0 and size-1, identifying the calling process
Better Hello (C) #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; }
Some Basic Concepts • Processes can be collected into groups. • Each message is sent in a context, and must be received in the same context. • A group and context together form a communicator. • A process is identified by its rank in the group associated with a communicator. • There is a default communicator whose group contains all initial processes, called MPI_COMM_WORLD.
MPI Datatypes • The data in a message to sent or received is described by a triple (address, count, datatype), where • An MPI datatypeis recursively defined as: • predefined, corresponding to a data type from the language (e.g., MPI_INT, MPI_DOUBLE_PRECISION) • a contiguous array of MPI datatypes • an indexed array of blocks of datatypes • an arbitrary structure of datatypes • There are MPI functions to construct custom datatypes, such an array of (int, float) pairs, or a row of a matrix stored columnwise.
Basic MPI types MPI datatypeC datatype MPI_CHAR signed char MPI_SIGNED_CHAR signed char MPI_UNSIGNED_CHAR unsigned char MPI_SHORT signed short MPI_UNSIGNED_SHORT unsigned short MPI_INT signed int MPI_UNSIGNED unsigned int MPI_LONG signed long MPI_UNSIGNED_LONG unsigned long MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double
MPI Tags • Messages are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message. • Messages can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive. • Some non-MPI message-passing systems have called tags “message types”. MPI calls them tags to avoid confusion with datatypes.
MPI blocking send MPI_SEND(void *start, int count,MPI_DATATYPE datatype, int dest, int tag, MPI_COMM comm) • The message buffer is described by (start, count, datatype). • dest is the rank of the target process in the defined communicator. • tag is the message identification number.
MPI Basic (Blocking) Receive MPI_RECV(start, count, datatype, source, tag, comm, status) • Waits until a matching (on source and tag) message is received from the system, and the buffer can be used. • source is rank in communicator specified by comm, or MPI_ANY_SOURCE. • status contains further information • Receiving fewer than count occurrences of datatype is OK, but receiving more is an error.
Retrieving Further Information • Status is a data structure allocated in the user’s program. • In C: intrecvd_tag, recvd_from, recvd_count; MPI_Status status; MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG, ..., &status ) recvd_tag = status.MPI_TAG; recvd_from = status.MPI_SOURCE; MPI_Get_count( &status, datatype, &recvd_count);
More info • A receive operation may accept messages from an arbitrary sender, but a send operation must specify a unique receiver. • Source equals destination is allowed, that is, a process can send a message to itself.
Why MPI is simple? • Many parallel programs can be written using just these six functions, only two of which are non-trivial; • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_SEND • MPI_RECV
Simple full example #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { const int tag = 42; /* Message tag */ int id, ntasks, source_id, dest_id, err, i; MPI_Status status; int msg[2]; /* Message array */ err = MPI_Init(&argc, &argv); /* Initialize MPI */ if (err != MPI_SUCCESS) { printf("MPI initialization failed!\n"); exit(1); } err = MPI_Comm_size(MPI_COMM_WORLD, &ntasks); /* Get nr of tasks */ err = MPI_Comm_rank(MPI_COMM_WORLD, &id); /* Get id of this process */ if (ntasks < 2) { printf("You have to use at least 2 processors to run this program\n"); MPI_Finalize(); /* Quit if there is only one processor */ exit(0); }
Simple full example (Cont.) if (id == 0) { /* Process 0 (the receiver) does this */ for (i=1; i<ntasks; i++) { err = MPI_Recv(msg, 2, MPI_INT, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, \ &status); /* Receive a message */ source_id = status.MPI_SOURCE; /* Get id of sender */ printf("Received message %d %d from process %d\n", msg[0], msg[1], \ source_id); } } else { /* Processes 1 to N-1 (the senders) do this */ msg[0] = id; /* Put own identifier in the message */ msg[1] = ntasks; /* and total number of processes */ dest_id = 0; /* Destination address */ err = MPI_Send(msg, 2, MPI_INT, dest_id, tag, MPI_COMM_WORLD); } err = MPI_Finalize(); /* Terminate MPI */ if (id==0) printf("Ready\n"); exit(0); return 0; }
Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application
Collective communications • A single call handles the communication between all the processes in a communicator • There are 3 types of collective communications • Data movement (e.g. MPI_Bcast) • Reduction (e.g. MPI_Reduce) • Synchronization (e.g. MPI_Barrier)
Broadcast • intMPI_Bcast(void *buffer, int count, MPI_Datatypedatatype, int root, MPI_Commcomm); • One process (root) sends data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Bcast P3 P3 P4 P4
Gather • intMPI_Gather(void *sendbuf, intsendcnt, MPI_Datatypesendtype, void *recvbuf, intrecvcnt, MPI_Datatyperecvtype, int root, MPI_Commcomm) • One process (root) collects data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Gather P3 P3 P4 P4
Gather to All • intMPI_Allgather(void *sendbuf, intsendcnt, MPI_Datatypesendtype, void *recvbuf, intrecvcnt, MPI_Datatyperecvtype, MPI_Commcomm) • All the processes collects data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Allgather P3 P3 P4 P4
Reduction • int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) • One process (root) collects data to all the other processes in the same communicator, and performs an operation on the data • MPI_SUM, MPI_MIN, MPI_MAX, MPI_PROD, logical AND, OR, XOR, and a few more • MPI_Op_create(): User defined operator P1 P1 P2 P2 MPI_Reduce P3 P3 P4 P4
Synchronization • intMPI_Barrier(MPI_Commcomm) #include "mpi.h" #include <stdio.h> int main(intargc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Barrier(MPI_COMM_WORLD); printf("Hello, world. I am %d of %d\n", rank, nprocs); MPI_Finalize(); return 0; }
Examples…. • Master and slaves
For more functions… • http://www.mpi-forum.org • http://www.llnl.gov/computing/tutorials/mpi/ • http://www.nersc.gov/nusers/help/tutorials/mpi/intro/ • http://www-unix.mcs.anl.gov/mpi/tutorial/ • MPICH (http://www-unix.mcs.anl.gov/mpi/mpich/) • Open MPI (http://www.open-mpi.org/) • http://w3.pppl.gov/~ethier/MPI_OpenMP_2011.pdf