1 / 44

MPI Message Passing Interface

MPI Message Passing Interface. Yvon Kermarrec. More readings. “Parallel programming with MPI”, Peter Pacheco, Morgan Kaufmann Publishers LAM/MPI User Guide: http://www.lam-mpi.org/tutorials/lam/ The MPI standard is available from http://www.mpi-forum.org/. Agenda.

wray
Download Presentation

MPI Message Passing Interface

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPI Message Passing Interface YvonKermarrec

  2. More readings • “Parallel programming with MPI”, Peter Pacheco, Morgan Kaufmann Publishers • LAM/MPI User Guide: http://www.lam-mpi.org/tutorials/lam/ • The MPI standard is available from http://www.mpi-forum.org/

  3. Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application

  4. Serial Computing • 1k pieces puzzle • Takes 10 hours

  5. Parallelism on Shared Memory • Orange and brown share the puzzle on the same table • Takes 6 hours(not 5 due to communication & contention)

  6. The more, the better?? • Lack of seats (Resource limit) • More contention among people

  7. Parallelism on Distributed Systems • Scalable seats (Scalable Resource) • Less contention from private memory spaces

  8. How to share the puzzle? • DSM (Distributed Shared Memory) • Message Passing

  9. DSM (Distributed Shared Memory) • Provides shared memory physically or virtually • Pros - Easy to use • Cons - Limited Scalability, High coherence overhead

  10. Message Passing • Pros – Scalable, Flexible • Cons – Someone says it’s more difficult than DSM

  11. Agenda • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application

  12. Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application

  13. Weneed more computational power • The weatherforcastexample by P Pacheco: • Suppose wewish to predict the weather over the United and Canada for the next 48 hours • Also suppose thatwewant to model the atmospherefromsealevel to an altitude of 20 km • we use a cubicalgrid, witheach cube measuring 0.1 km to model the atmosphere ,or 2.0 x 107 km2 x 20 km x 103 cubes per km3 = 4 x 1011grid points • Suppose weneed to computer 100 instructions for each points for the next 48 hours : weneed4 x 1013 x 48 operations • If our computer executes109 ope/sec, weneed 23 days

  14. The need for parallelprogramming • We face numerous challenges in science (biology, simulation, earthquakes, …) and wecannotbuildfastenough computers…. • Data canbebig (big data…) and memoryisratherlimited • Processors can do a lot ... But to adress figures as mentionnedwecan program smarter but thatis not enough

  15. The need for parallel machines • Wecanbuild a parallel machines, but thereisstill a hugeamount of work to bedone: • decide on and implement an interconnection network for the processors and memory modules, • design and implement system software for the hardware • Design algorithms and data structures to solve our problem • Divide the algorithms and data structures into subproblems • Indentify the communications and data exchanges • Assign subproblems to processors

  16. The need for parallel machines • Flynn’staxonomy (or how to work more!) • SISD : Single Instruction – Single Data : the common and classical machine… • SIMD : Single Instruction – Multiple data : the same instructions are carried out simultaneously on multiple data items • MIMD : Multiple Instructions – Multiple Data • SPMD : Single Program – Multiple Data : the same version of the program isreplicated and run on different data

  17. The need for parallel machines • Wecanbuild one parallel computer … but thatwouldbeveryexpensive, time and energyconsuming, … and hard to maintain • Wemaywant to integratewhatisavailable in the labs – to agregate the availablecomputing ressources and reuseordinary machines : • US D.oEnergy and the PVM project (Parallel Virtual Machine) from ‘89

  18. MPI : Message Passing Interface ? • MPI : an Interface • A message-passing libraryspecification • extended message-passing model • not a language or compiler specification • not a specific implementation or product • For parallel computers, clusters, and heterogeneous networks • A riche set of features • Designed to provide access to advanced parallel hardware for end users, library writers, and tooldevelopers

  19. MPI ? • An international product • Earlyvendorsystems (Intel’s NX, IBM’s EUI, TMC’s CMMD) were not portable • Early portable systems (PVM, p4, TCGMSG, Chameleon) were mainly research efforts • Were rather limited… and lackedvendor support • Were not implemented at the most efficient level • The MPI Forum organized in 1992 with broad participation by: • vendors: IBM, Intel, TMC, SGI, Convex … • users: application scientists and library writers

  20. How big is the MPI library? • Huge ( 125 Functions )… • Basic ( 6 Functions ) • But only a subset is needed to program a distributed application

  21. Environments for parallelprogramming • Upshot, Jumpshot, and MPE tools • http://www.mcs.anl.gov/research/projects/perfvis/software/viewers/ • • Pallas VAMPIR • http://www.vampir.eu/ • • Paragraph • http://www.ncsa.uiuc.edu/Apps/MCS/ParaGraph/ParaGraph.html

  22. A Minimal MPI Program in C #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello, world!\n" ); MPI_Finalize(); return 0; }

  23. Finding Out About the Environment • Two important questions that arise early in a parallel program are: • How many processes are participating in this computation? • Which one am I? • MPI provides functions to answer these questions: • MPI_Comm_size reports the number of processes. • MPI_Comm_rank reports the rank, a number between 0 and size-1, identifying the calling process

  24. Better Hello (C) #include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; }

  25. Some Basic Concepts • Processes can be collected into groups. • Each message is sent in a context, and must be received in the same context. • A group and context together form a communicator. • A process is identified by its rank in the group associated with a communicator. • There is a default communicator whose group contains all initial processes, called MPI_COMM_WORLD.

  26. MPI Datatypes • The data in a message to sent or received is described by a triple (address, count, datatype), where • An MPI datatypeis recursively defined as: • predefined, corresponding to a data type from the language (e.g., MPI_INT, MPI_DOUBLE_PRECISION) • a contiguous array of MPI datatypes • an indexed array of blocks of datatypes • an arbitrary structure of datatypes • There are MPI functions to construct custom datatypes, such an array of (int, float) pairs, or a row of a matrix stored columnwise.

  27. Basic MPI types MPI datatypeC datatype MPI_CHAR signed char MPI_SIGNED_CHAR signed char MPI_UNSIGNED_CHAR unsigned char MPI_SHORT signed short MPI_UNSIGNED_SHORT unsigned short MPI_INT signed int MPI_UNSIGNED unsigned int MPI_LONG signed long MPI_UNSIGNED_LONG unsigned long MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double

  28. MPI Tags • Messages are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message. • Messages can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive. • Some non-MPI message-passing systems have called tags “message types”. MPI calls them tags to avoid confusion with datatypes.

  29. MPI blocking send MPI_SEND(void *start, int count,MPI_DATATYPE datatype, int dest, int tag, MPI_COMM comm) • The message buffer is described by (start, count, datatype). • dest is the rank of the target process in the defined communicator. • tag is the message identification number.

  30. MPI Basic (Blocking) Receive MPI_RECV(start, count, datatype, source, tag, comm, status) • Waits until a matching (on source and tag) message is received from the system, and the buffer can be used. • source is rank in communicator specified by comm, or MPI_ANY_SOURCE. • status contains further information • Receiving fewer than count occurrences of datatype is OK, but receiving more is an error.

  31. Retrieving Further Information • Status is a data structure allocated in the user’s program. • In C: intrecvd_tag, recvd_from, recvd_count; MPI_Status status; MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG, ..., &status ) recvd_tag = status.MPI_TAG; recvd_from = status.MPI_SOURCE; MPI_Get_count( &status, datatype, &recvd_count);

  32. More info • A receive operation may accept messages from an arbitrary sender, but a send operation must specify a unique receiver. • Source equals destination is allowed, that is, a process can send a message to itself.

  33. Why MPI is simple? • Many parallel programs can be written using just these six functions, only two of which are non-trivial; • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_SEND • MPI_RECV

  34. Simple full example #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { const int tag = 42; /* Message tag */ int id, ntasks, source_id, dest_id, err, i; MPI_Status status; int msg[2]; /* Message array */ err = MPI_Init(&argc, &argv); /* Initialize MPI */ if (err != MPI_SUCCESS) { printf("MPI initialization failed!\n"); exit(1); } err = MPI_Comm_size(MPI_COMM_WORLD, &ntasks); /* Get nr of tasks */ err = MPI_Comm_rank(MPI_COMM_WORLD, &id); /* Get id of this process */ if (ntasks < 2) { printf("You have to use at least 2 processors to run this program\n"); MPI_Finalize(); /* Quit if there is only one processor */ exit(0); }

  35. Simple full example (Cont.) if (id == 0) { /* Process 0 (the receiver) does this */ for (i=1; i<ntasks; i++) { err = MPI_Recv(msg, 2, MPI_INT, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, \ &status); /* Receive a message */ source_id = status.MPI_SOURCE; /* Get id of sender */ printf("Received message %d %d from process %d\n", msg[0], msg[1], \ source_id); } } else { /* Processes 1 to N-1 (the senders) do this */ msg[0] = id; /* Put own identifier in the message */ msg[1] = ntasks; /* and total number of processes */ dest_id = 0; /* Destination address */ err = MPI_Send(msg, 2, MPI_INT, dest_id, tag, MPI_COMM_WORLD); } err = MPI_Finalize(); /* Terminate MPI */ if (id==0) printf("Ready\n"); exit(0); return 0; }

  36. Agenda • Part 0 – the context • Slidesextractedfrom a lecture fromHanjunKin, Princeton U. • Part 1 - Introduction • Basics of ParallelComputing • Six-function MPI • Point-to-Point Communications • Part 2 – Advanced features of MPI • Collective Communication • Part 3 – examples and how to program an MPI application

  37. Collective communications • A single call handles the communication between all the processes in a communicator • There are 3 types of collective communications • Data movement (e.g. MPI_Bcast) • Reduction (e.g. MPI_Reduce) • Synchronization (e.g. MPI_Barrier)

  38. Broadcast • intMPI_Bcast(void *buffer, int count, MPI_Datatypedatatype, int root, MPI_Commcomm); • One process (root) sends data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Bcast P3 P3 P4 P4

  39. Gather • intMPI_Gather(void *sendbuf, intsendcnt, MPI_Datatypesendtype, void *recvbuf, intrecvcnt, MPI_Datatyperecvtype, int root, MPI_Commcomm) • One process (root) collects data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Gather P3 P3 P4 P4

  40. Gather to All • intMPI_Allgather(void *sendbuf, intsendcnt, MPI_Datatypesendtype, void *recvbuf, intrecvcnt, MPI_Datatyperecvtype, MPI_Commcomm) • All the processes collects data to all the other processes in the same communicator • Must be called by all the processes with the same arguments P1 P1 P2 P2 MPI_Allgather P3 P3 P4 P4

  41. Reduction • int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) • One process (root) collects data to all the other processes in the same communicator, and performs an operation on the data • MPI_SUM, MPI_MIN, MPI_MAX, MPI_PROD, logical AND, OR, XOR, and a few more • MPI_Op_create(): User defined operator P1 P1 P2 P2 MPI_Reduce P3 P3 P4 P4

  42. Synchronization • intMPI_Barrier(MPI_Commcomm) #include "mpi.h" #include <stdio.h> int main(intargc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Barrier(MPI_COMM_WORLD); printf("Hello, world. I am %d of %d\n", rank, nprocs); MPI_Finalize(); return 0; }

  43. Examples…. • Master and slaves

  44. For more functions… • http://www.mpi-forum.org • http://www.llnl.gov/computing/tutorials/mpi/ • http://www.nersc.gov/nusers/help/tutorials/mpi/intro/ • http://www-unix.mcs.anl.gov/mpi/tutorial/ • MPICH (http://www-unix.mcs.anl.gov/mpi/mpich/) • Open MPI (http://www.open-mpi.org/) • http://w3.pppl.gov/~ethier/MPI_OpenMP_2011.pdf

More Related