760 likes | 1.05k Views
Lecture 2: Part II Message Passing Programming: MPI. Introduction to MPI MPI programming Running MPI program Architecture of MPICH. Message Passing Interface (MPI). go. What is MPI?. A message passing library specification message-passing model not a compiler specification
E N D
Lecture 2: Part IIMessage Passing Programming: MPI • Introduction to MPI • MPI programming • Running MPI program • Architecture of MPICH
What is MPI? • A message passing library specification • message-passing model • not a compiler specification • not a specific product • For parallel computers, clusters and heterogeneous networks. • Full-featured
Why use MPI? (1) Message passing now mature as programming paradigm • well understood • efficient match to hardware • many applications
Why use MPI? (2) • Full range of desired features • modularity • access to peak performance • portability • heterogeneity • subgroups • topologies • performance measurement tools
Who Designed MPI ? • Venders • IBM, Intel, TMC, SGI, Meiko, Cray, Convex, Ncube,….. • Library writers • PVM, p4, Zipcode, TCGMSG, Chameleon, Express, Linda, DP (HKU), PM (Japan), AM (Berkeley), FM (HPVM at Illinois) • Application specialists and consultants
Vender-Supported MPI • HP-MPI Hewlett Packard; Convex SPP • MPI-F IBM SP1/SP2 • Hitachi/MPI Hitachi • SGI/MPI SGI PowerChallenge series • MPI/DE NEC. • INTEL/MPI Intel. Paragon (iCC lib) • T.MPI Telmat Multinode • Fujitsu/MPI Fujitsu AP1000 • EPCC/MPICray & EPCC, T3D/T3E. Cho-Li Wang
Public-Domain MPI • MPICH Argonne National Lab. & Mississippi State Univ. • LAM Ohio Supercomputer center • MPICH/NT Mississippi State University • MPI-FM Illinois (Myrinet) • MPI-AM UC Berkeley (Myrinet) • MPI-PM RWCP, Japan (Myrinet) • MPI-CCL California Institute of Technology Cho-Li Wang
Public-Domain MPI • CRI/EPCC MPI Cray Research and Edinburgh Parallel Computing Centre (Cray T3D/E) • MPI-AP Australian National University- CAP Research Program (AP1000) • W32MPI Illinois, Concurrent Systems • RACE-MPI Hughes Aircraft Co. • MPI-BIP INRIA, France (Myrinet)
Communicator Conceptin MPI • Identify the process group and context with respect to which the operation is to be performed
Communicator within Communicator Four communicators Process in different communicators cannot communicate Same process can be existed in different communicators Process Process Process Process Process Process Process Process Process Communicator (2) Process Process Process Process Process Process Process Process Process Process Process
Features of MPI (1) go • General • Communicators combine context and group for message security • Predefined communicator MPI_COMM_WORLD
Features of MPI (2) Point-to-point communication • Structured buffers and derived data types, heterogeneity • Modes : normal (blocking and non-blocking), synchronous, ready (to allow access to fast protocols), buffered
Features of MPI (3) Collective Communication • Both built-in and user-defined collective operations • Large number of data movement routines • Subgroups defined directly or by topology • E.g, broadcast, barrier, reduce, scatter, gather, all-to-all, ..
Writing MPI programs • MPI comprises 125 functions • Many parallel programs can be written with just 6 basic functions
Six basic functions (1) • MPI_INITInitiate an MPI computationint MPI_Init ( argc, argv ) • MPI_FINALIZETerminate a computation int MPI_Finalize ( )
Six basic functions (2) • MPI_COMM_SIZEDetermine number of processes in a communicator • MPI_COMM_RANKDetermine the identifier of a process in a specific communicator
int MPI_Comm_size ( comm, size ) MPI_Comm comm; int *size; • int MPI_Comm_rank ( comm, rank ) MPI_Comm comm; int *rank;
Six basic functions (3) • MPI_SENDSend a message from one process to another process • MPI_RECVReceive a message from one process to another process
int MPI_Send( buf, count, datatype, dest, tag, comm ) void *buf; int count, dest, tag; MPI_Datatype datatype; MPI_Comm comm; tag distinguishes different types of messages dest is a rank in comm
int MPI_Recv( buf, count, datatype, source, tag, comm, status ) void *buf; int count, source, tag; MPI_Datatype datatype; MPI_Comm comm; MPI_Status *status;
Find the process ID of current process Each process prints out its output Find the number of processes Shut down Initiate computation MPI_INIT() MPI_COMM_SIZE(MPI_COMM_WORLD, count) MPI_COMM_RANK(MPI_COMM_WORLD, myid) print(“I am “, myid, “ of “, count) MPI_FINALIZE() A simple program Program main begin MPI_INIT() MPI_COMM_SIZE(MPI_COMM_WORLD, count) MPI_COMM_RANK(MPI_COMM_WORLD, myid) print(“I am ”, myid, “ of ”, count) MPI_FINALIZE() end
I’m 3 of 4 I’m 1 of 4 I’m 0 of 4 I’m 2 of 4 Result Process 3 Process 1 Process 0 Process 2
Send Receive Transmission Point-to-Point Communication The basic point-to-point communication operators are send and receive. Buffer Buffer Sender Receiver
if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…)…… I’m process 0! else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…) I’m process 1! Another simple program (2 nodes) ….. MPI_COMM_RANK(MPI_COMM_WORLD, myid) if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…) else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…) END IF print(“Received from “,words) ……
MPI_SEND (“Zero”,…,…,1,…,…) MPI_RECV (words,…,…,0,…,…,…) Received MPI_RECV (words,…,…,1,…,…) MPI_SEND (“One”,…,…,0,…,…,…) Received Print(“Received from “,words) Print(“Received from “,words) Process 0 Process 1 Setup buffer and wait the message from process 0 words (buffer) Send “Zero” to process 1 Zero Wait Setup buffer and wait the message from process 1 words (buffer) One Send “One” to process 0 Wait
Received from One Received from Zero Result Process 0 Process 1
Send Receive Transmission Buffer Buffer Buffer Collective Communication (1) Communication that involves a group of processes Buffer Sender Receivers
Collective Communication (2) Three Types • Barrier • MPI_BARRIER • Data movement • MPI_BCAST • MPI_GATHER • MPI_SCATTER • Reduction operations • MPI_REDUCE
We can’t go on! We’re together! The barrier will be disappeared! Let’s go! Wait for us! Barrier Barrier Barrier Barrier go MPI_BARRIER • Used to synchronize execution of a group of processes
int MPI_Barrier ( comm ) MPI_Comm comm; • int MPI_Bcast ( buffer, count, datatype, root, comm ) void *buffer; int count; MPI_Datatype datatype; int root; MPI_Comm comm;
FACE FACE FACE Data movement (1) MPI_BCAST • One single process sends the same data to all other processes, itself included BCAST BCAST BCAST BCAST FACE FACE Process 0 Process 1 Process 2 Process 3
F A C Data movement (2) MPI_GATHER • All process (include the root process) send the same data to one process and store them in rank order GATHER GATHER GATHER GATHER F A C FACE E E Process 0 Process 1 Process 2 Process 3
int MPI_Gather ( sendbuf, sendcnt, sendtype, recvbuf, recvcount, recvtype, root, comm ) void *sendbuf; int sendcnt; MPI_Datatype sendtype; void *recvbuf; int recvcount; MPI_Datatype recvtype; int root; MPI_Comm comm;
100 100 100 100 100 100 at root rbuf rbuf = new int[gsize*100] Examples 1) Gather 100 ints from every process in group to root MPI_comm comm; int root, myrank, *rbuf, gsize, sendbuf [100]; … … MPI_Comm_size (comm, &gsize); rbuf = (int *) malloc (gsize*100*sizeof (int)); MPI_Gather(sendbuf,100,MPI_int, rbuf,100,MPI_int,root,comm);
MPI_comm comm; int root, myrank, *rbuf, gsize, sendbuf [100]; … … MPI_Comm_rank (comm, myrank); if (myrank = = root) {MPI_Comm_size (comm, &gsize); rbuf = (int *) malloc (gsize*100*sizeof (int));} MPI_Gather(sendbuf,100,MPI_int, rbuf,100,MPI_int, root,comm);
F A C E Data movement (3) MPI_SCATTER • A process sends out a message, which is split into several equals parts, and the ith portion is sent to the ith process SCATTER SCATTER SCATTER SCATTER FACE Process 0 Process 1 Process 2 Process 3
int MPI_Scatter ( sendbuf, sendcnts, sendtype, recvbuf, recvcnt, recvtype, root, comm ) void *sendbuf; int *sendcnts; MPI_Datatype sendtype; void *recvbuf; int recvcnt; MPI_Datatype recvtype; int root; MPI_Comm comm;
8 9 3 7 Data movement (4) MPI_REDUCE (e.g., find maximum value) • combine the values of each process, using a specified operation, and return the combined value to a process REDUCE REDUCE REDUCE REDUCE max 8 9 9 3 7 Process 0 Process 1 Process 2 Process 3
int MPI_Reduce ( sendbuf, recvbuf, count, datatype, op, root, comm ) void *sendbuf; void *recvbuf; int count; MPI_Datatype datatype; MPI_Op op; int root; MPI_Comm comm;
Predefined operations • MPI_MAX • MPI_MIN • MPI_SUM • MPI_PROD • MPI_LAND logical and • MPI_BAND bit-wise and • MPI_LOR MPI_BOR • MPI_LXOR MPI_BXOR • MPI_MAXLOC • MPI_MINLOC
Example program (1) Calculating the value of by:
MPI_BCAST(numprocs, …, …, 0, …) for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) if (myid == 0) Output result Boardcast the no. of process Each process calculate specified areas Sum up all the areas Print the result Example program (2) …… MPI_BCAST(numprocs, …, …, 0, …) for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) if (myid == 0) Output result ……
Calculated by process 0 Calculated by process 2 Calculated by process 1 Calculated by process 3 OK! OK! =3.141... Start calculation! OK! OK!
MPICH - A Portable Implementation of MPI Argonne National Laboratory
What is MPICH??? • The first complete and portable implementation of full MPI standard. • ‘CH’ stands for “Chameleon” symbol of adaptability and portability. • It contains a programming environment for working with MPI programs. • It includes a portable startup mechanism and libraries. ,
How can I install it??? • Install the packet mpich.tar.gz to a directory • Use ‘./configure’ and ‘make >& make.log to choose appropriate architecture and device and compile the file • Syntax: ./configure -device=DEVICE -arch=ARCH_TYPE • ARCH_TYPE: specify the type of machine to be configured • DEVICE: specify what kind of communication device the system will choose - ch_p4 (TCP/IP)
How to run an MPI Program The file should be in the format: mercury venus earth mars earth mars • Edit mpich/util/machines/machines.XXXX, to contain names of machines of architecture xxxx. For example: Computer mercury Computer venus Computer mars Computer earth
How to run an MPI Program • include “mpi.h” into the source program. • Compile program by using command ‘mpicc’ - mpicc -c foo.c • Use ‘mpirun’ to run an MPI program. mpirun will determine the environment for the program to run