330 likes | 629 Views
Message Passing Interface (MPI) and Parallel Algorithm Design. What is MPI?. A message passing library specification message-passing model not a compiler specification not a specific product For parallel computers, clusters and heterogeneous networks. Full-featured. Why use MPI? (1).
E N D
Message Passing Interface (MPI) and Parallel Algorithm Design
What is MPI? • A message passing library specification • message-passing model • not a compiler specification • not a specific product • For parallel computers, clusters and heterogeneous networks. • Full-featured
Why use MPI? (1) • Message passing now mature as programming paradigm • well understood • efficient match to hardware • many applications
Who Designed MPI ? • Venders • IBM, Intel, Sun, SGI, Meiko, Cray, Convex, Ncube,….. • Research Lab. • PVM, p4, Zipcode, TCGMSG, Chameleon, Express, Linda, PM (Japan RWCP), AM (Berkeley), FM (HPVM at Illinois)
Vender-Supported MPI • HP-MPI HP; Convex SPP • MPI-F IBM SP1/SP2 • Hitachi/MPI Hitachi • SGI/MPI SGI PowerChallenge series • MPI/DE NEC. • INTEL/MPI Intel. Paragon (iCC lib) • T.MPI Telmat Multinode • Fujitsu/MPI Fujitsu AP1000 • EPCC/MPI Cray & EPCC, T3D/T3E.
Research MPI • MPICH Argonne National Lab. & Mississippi State U. • LAM Ohio Supercomp. center • MPICH/NT Mississippi State U. • MPI-FM Illinois (Myrinet) • MPI-AM UC Berkeley (Myrinet) • MPI-PM RWCP, Japan (Myrinet) • MPI-CCL Calif. Tech.
Research MPI • CRI/EPCC MPI Cray Research and Edinburgh (Cray T3D/E) Parallel Computing Centre • MPI-AP Australian National U.- (AP1000) CAP Research Program • W32MPI Illinois, Concurrent Systems • RACE-MPI Hughes Aircraft Co. • MPI-BIP INRIA, France (Myrinet)
Language Binding • MPI 1: C, Fortran (for MPICH-based implementation) • MPI 2: C, C++, Fortran • Java : • Through Java native method interface (JNI): mpiJava JavaMPI • Implement the MPI package by pure Java: MPIJ: (DOGMA project) • JMPI (by MPI Software Technology)
“Communicator” • Identify the process group and context with respect to which the operation is to be performed • In a parallel environment, processes need to know each others (“naming”: machine name, IP address, process ID)
Communicator within Communicator Four communicators Process in different communicators cannot communicate Same process can be existed in different communicators Process Process Process Process Process Process Process Process Process Communicator (2) Process Process Process Process Process Process Process Process Process Process Process
Point-to-point Communication • The basic point-to-point communication operators are send and receive. • Communication Modes : • normal mode (blocking and non-blocking), • synchronous mode, • ready mode (to allow access to fast protocols), • buffered mode • ….
Collective Communication Communication that involves a group of processes. E.g, broadcast, barrier, reduce, scatter, gather, all-to-all, ..
Writing MPI programs • MPI comprises 125 functions • Many parallel programs can be written with just 6 basic functions
Six basic functions (1) 1. MPI_INIT: Initiate an MPI computation 2. MPI_FINALIZE: Terminate a computation 3. MPI_COMM_SIZE: Determine number of processes in a communicator 4. MPI_COMM_RANK: Determine the identifier of a process in a specific communicator 5. MPI_SEND: Send a message from one process to another process 6. MPI_RECV: Receive a message from one process to another process
A simple program Initiate computation Program main begin MPI_INIT() MPI_COMM_SIZE(MPI_COMM_WORLD, count) MPI_COMM_RANK(MPI_COMM_WORLD, myid) print(“I am ”, myid, “ of ”, count) MPI_FINALIZE() end Find the number of processes Find the process ID of current process Each process prints out its output Shut down
I’m 0 of 4 I’m 1 of 4 I’m 2 of 4 I’m 3 of 4 Result Process 0 Process 1 Process 2 Process 3
if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…)…… I’m process 0! I’m process 1! else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…) Another program (2 nodes) ….. MPI_COMM_RANK(MPI_COMM_WORLD, myid) if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…) else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…) END IF print(“Received from %s”,words) ……
Received from One Received from Zero Result Process 0 Process 1
Collective Communication Three Types of Collective Operations • Barrier • for process synchronization • MPI_BARRIER • Data movement • moving data among processes • no computation • MPI_BCAST, MPI_GATHER, MPI_SCATTER • Reduction operations • Involve computation • MPI_REDUCE, MPI_SCAN
Barrier MPI_BARRIER • Used to synchronize execution of a group of processes Process 1 Process 2 Process p Perform barrier compute Blocking time wait Continue execution All members reach the same point before any can proceed
Data Movement • Broadcast: • one member sends the same message to all members • Scatter: • one member sends a different message to each member • Gather: • every member sends a message to a single member • All-to-all broadcast: • every member performs a broadcast • All-to-all scatter-gather (Total Exchange): • every member performs a scatter (and gather)
MPI Collective Communications • Broadcast (MPI_Bcast) • Combine-to-one (MPI_Reduce) • Scatter (MPI_Scatter) • Gather (MPI_Gather) • Collect (MPI_Allgather) • Combine-to-all (MPI_Allreduce) • Reduce: (MPI_Reduce) • Scan: (MPI_Scan) • All-to-All: (MPI_Alltoall)
Data movement (1) FACE FACE FACE MPI_BCAST • One single process sends the same data to all other processes, itself included BCAST BCAST BCAST BCAST FACE FACE Process 0 Process 1 Process 2 Process 3
F A C Data movement (2) MPI_GATHER • All process (include the root process) send the same data to one process and store them in rank order GATHER GATHER GATHER GATHER F A C FACE E E Process 0 Process 1 Process 2 Process 3
F A C E Data movement (3) MPI_SCATTER • A process sends out a message, which is split into several equals parts, and the ith portion is sent to the ith process SCATTER SCATTER SCATTER SCATTER FACE Process 0 Process 1 Process 2 Process 3
8 9 3 7 Data movement (4) MPI_REDUCE (e.g., find maximum value) • combine the values of each process, using a specified operation, and return the combined value to a process REDUCE REDUCE REDUCE REDUCE max 8 9 9 3 7 Process 0 Process 1 Process 2 Process 3
+ + + + MPI_SCAN Scan (parallel prefix): “partial” reduction based upon relative process number Scan Op: + 2 1 4 1 1 Input 2 3 7 8 9 Result Process 0 Process 3 Process 5
Example program (1) Calculating the value of by:
Example program (2) …… MPI_BCAST(numprocs, …, …, 0, …) for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) if (myid == 0) Output result ……
Calculated by process 0 Calculated by process 2 Calculated by process 1 Calculated by process 3 OK! OK! =3.141... Start calculation! OK! OK!