PARALLEL COMPUTING WITH MPI

PARALLEL COMPUTINGWITH MPI Anne Weill-Zrahia With acknowledgments to Cornell Theory Center

Introduction to Parallel Computing • Parallel computer :A set of processors that work cooperatively to solve a computational problem. • Distributed computing : a number of processors communicating over a network • Metacomputing : Use of several parallel computers

Why parallel computing • Single processor performance – limited by physics • Multiple processors – break down problem into simple tasks or domains • Plus – obtain same results as in sequential program, faster. • Minus – need to rewrite code

Parallel classification • Parallel architectures Shared Memory / Distributed Memory • Programming paradigms Data parallel / Message passing

Shared memory P P P P Memory

Shared Memory • Each processor can access any part of the memory • Access times are uniform (in principle) • Easier to program (no explicit message passing) • Bottleneck when several tasks access same location

Data-parallel programming • Single program defining operations • Single memory • Loosely synchronous (completion of loop) • Parallel operations on array elements

Distributed Memory • Processor can only access local memory • Access times depend on location • Processors must communicate via explicit message passing

Distributed Memory Processor Memory Processor Memory Interconnection network

Message Passing Programming • Separate program on each processor • Local Memory • Control over distribution and transfer of data • Additional complexity of debugging due to communications

Performance issues • Concurrency – ability to perform actions simultaneously • Scalability – performance is not impaired by increasing number of processors • Locality – high ration of local memory accesses/remote memory accesses (or low communication)

SP2 Benchmark • Goal : Checking performance of real world applications on the SP2 • Execution time (seconds):CPU time for applications • Speedup Execution time for 1 processor = ------------------------------------ Execution time for p processors

WHAT is MPI? • A message- passing library specification • Extended message-passing model • Not specific to implementation or computer

BASICS of MPI PROGRAMMING • MPI is a message-passing library • Assumes : a distributed memory architecture • Includes : routines for performing communication (exchange of data and synchronization) among the processors.

Message Passing • Data transfer + synchronization • Synchronization : the act of bringing one or more processes to known points in their execution • Distributed memory: memory split up into segments, each may be accessed by only one process.

Message Passing May I send? yes Send data

MPI STANDARD • Standard by consensus, designed in an open forum • Introduced by the MPI FORUM in May 1994, updated in June 1995. • MPI-2 (1998) produces extensions to the MPI standard

IS MPI Large or Small? • A large number of features has been included (blocking/non-blocking , collective vs p.t.p,efficiency features) However ... • A small subset of functions is sufficient

Why use MPI ? • Standardization • Portability • Performance • Richness • Designed to enable libraries

Writing an MPI Program • If there is a serial version , make sure it is debugged • If not, try to write a serial version first • When debugging in parallel , start with a few nodes first.

Format of MPI routines

Six useful MPI functions

Communication routines

End MPI part of program

#include “mpi.h”; int main( int argc, char *argv[]){ int tag=100; int rank,size,i; MPI_Status * statuschar message[12]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); strcpy(message,"Hello,world"); if (rank==0) for (i=1;i<size;i++){ MPI_Send(message,12,MPI_CHAR,i,tag,MPI_COMM_WORLD); } } elseMPI_Recv(message,12,MPI_CHAR,0,tag,MPI_COMM_WORLD,&status); printf("node %d : %s \n",rank,message); MPI_Finalize(); return 0; }

MPI Messages • DATA data to be sent • ENVELOPE – information to route the data.

Description of MPI_Send (MPI_Recv)

Some useful remarks • Source= MPI_ANY_SOURCE means that any source is acceptable • Tags specified by sender and receiver must match, or MPI_ANY_TAG : any tag is acceptable • Communicator must be the same for send/receive. Usually : MPI_COMM_WORLD

POINT-TO-POINT COMMUNICATION • Transmission of a message between one pair of processes • Programmer can choose mode of transmission • Programmer can choose mode of transmission

Can be chosen by programmer …or let the system decide Synchronous mode Ready mode Buffered mode Standard mode MODE of TRANSMISSION

BLOCKING /NON-BLOCKING COMMUNICATIONS

BLOCKING STANDARD SEND Date transfer from source complete MPI_SEND Size>threshold Task waits S R wait Transfer begins when MPI_RECV has been posted MPI_RECV Task continues when data transfer to buffer is complete

NON BLOCKING STANDARD SEND Date transfer from source complete MPI_ISEND MPI_WAIT Size>threshold Task waits S R wait Transfer begins when MPI_IRECV has been posted MPI_IRECV MPI_WAIT No interruption if wait is late enough

BLOCKING STANDARD SEND MPI_SEND Size<=threshold Data transfer from source complete S R Transfer to buffer on receiver MPI_RECV Task continues when data transfer to user’sbuffer is complete

NON BLOCKING STANDARD SEND Date transfer from source complete MPI_ISEND MPI_WAIT Size<=threshold No delay even though message is not yet in buffer on R S R Transfer to buffer can be avoided if MPI_IRECV posted early enough MPI_IRECV MPI_WAIT No delay if wait is late enough

BLOCKING COMMUNICATION

NON-BLOCKING

Deadlock program (cont) if ( irank.EQ.0 ) then idest = 1 isrc = 1 isend_tag = ITAG_A irecv_tag = ITAG_B else if ( irank.EQ.1 ) then idest = 0 isrc = 0 isend_tag = ITAG_B irecv_tag = ITAG_A end if C ----------------------------------------------------------------C send and receive messagesC ------------------------------------------------------------- print *, " Task ", irank, " has sent the message" call MPI_Send ( rmessage1, MSGLEN, MPI_REAL, idest, isend_tag, . MPI_COMM_WORLD, ierr ) call MPI_Recv ( rmessage2, MSGLEN, MPI_REAL, isrc, irecv_tag, . MPI_COMM_WORLD, istatus, ierr ) print *, " Task ", irank, " has received the message" call MPI_Finalize (ierr) end

DEADLOCK example MPI_RECV MPI_SEND A B MPI_SEND MPI_RECV

Deadlock example • SP2 implementation:No Receive has been posted yet,so both processes block • Solutions Different ordering Non-blocking calls MPI_Sendrecv

Determining Information about Messages • Wait • Test • Probe

MPI_WAIT • Useful for both sender and receiver of non-blocking communications • Receiving process blocks until message is received, under programmer control • Sending process blocks until send operation completes, at which time the message buffer is available for re-use

MPI_WAIT compute transmit S R MPI_WAIT

MPI_TEST MPI_TEST compute transmit S MPI_Isend R

MPI_TEST • Used for both sender and receiver of non-blocking communication • Non-blocking call • Receiver checks to see if a specific sender has sent a message that is waiting to be delivered ... messages from all other senders are ignored

MPI_TEST (cont.) Sender can find out if the message-buffer can be re-used ... have to wait until operation is complete before doing so

MPI_PROBE • Receiver is notified when messages from potentially any sender arrive and are ready to be processed. • Blocking call

PARALLEL COMPUTING WITH MPI