Parallel Programming on the SGI Origin2000

Parallel Programming on the SGI Origin2000 Taub Computer Center Technion Moshe Goldberg, mgold@tx.technion.ac.il With thanks to Igor Zacharov / Benoit Marchand, SGI Mar 2004 (v1.2)

Parallel Programming on the SGI Origin2000 • Parallelization Concepts • SGI Computer Design • Efficient Scalar Design • Parallel Programming -OpenMP • Parallel Programming- MPI

2) Parallel Programming-MPI

Parallel classification • Parallel architectures Shared Memory / Distributed Memory • Programming paradigms Data parallel / Message passing

Shared Memory • Each processor can access any part of the memory • Access times are uniform (in principle) • Easier to program (no explicit message passing) • Bottleneck when several tasks access same location

Distributed Memory • Processor can only access local memory • Access times depend on location • Processors must communicate via explicit message passing

Distributed Memory Processor Memory Processor Memory Interconnection network

Message Passing Programming • Separate program on each processor • Local Memory • Control over distribution and transfer of data • Additional complexity of debugging due to communications

Performance issues • Concurrency – ability to perform actions simultaneously • Scalability – performance is not impaired by increasing number of processors • Locality – high ration of local memory accesses/remote memory accesses (or low communication)

SP2 Benchmark • Goal : Checking performance of real world applications on the SP2 • Execution time (seconds):CPU time for applications • Speedup Execution time for 1 processor = ------------------------------------ Execution time for p processors

WHAT is MPI? • A message- passing library specification • Extended message-passing model • Not specific to implementation or computer

BASICS of MPI PROGRAMMING • MPI is a message-passing library • Assumes : a distributed memory architecture • Includes : routines for performing communication (exchange of data and synchronization) among the processors.

Message Passing • Data transfer + synchronization • Synchronization : the act of bringing one or more processes to known points in their execution • Distributed memory: memory split up into segments, each may be accessed by only one process.

Message Passing May I send? yes Send data

MPI STANDARD • Standard by consensus, designed in an open forum • Introduced by the MPI FORUM in May 1994, updated in June 1995. • MPI-2 (1998) produces extensions to the MPI standard

Why use MPI ? • Standardization • Portability • Performance • Richness • Designed to enable libraries

Writing an MPI Program • If there is a serial version , make sure it is debugged • If not, try to write a serial version first • When debugging in parallel , start with a few nodes first.

Format of MPI routines

Six useful MPI functions

Communication routines

End MPI part of program

program hello include ’mpif.h’ status(MPI_STATUS_SIZE) character*12 messagecall MPI_INIT(ierror) call MPI_COMM_SIZE(MPI_COMM_WORLD, size,ierror) call MPI_COMM_RANK(MPI_COMM_WORLD, rank,ierror)tag = 100 if(rank .eq. 0) then message = 'Hello, world' do i=1, size-1call MPI_SEND(message, 12, MPI_CHARACTER , i, & tag,MPI_COMM_WORLD,ierror) enddo else call MPI_RECV(message, 12, MPI_CHARACTER, 0,tag,MPI_COMM_WORLD, status, ierror) endif print*, 'node', rank, ':', message call MPI_FINALIZE(ierror) end

int main( int argc, char *argv[]){ int tag=100; int rank,size,i; MPI_Status * statuschar message[12]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); strcpy(message,"Hello,world"); if (rank==0) for (i=1;i<size;i++){ MPI_Send(message,12,MPI_CHAR,i,tag,MPI_COMM_WORLD); } } elseMPI_Recv(message,12,MPI_CHAR,0,tag,MPI_COMM_WORLD,&status); printf("node %d : %s \n",rank,message); MPI_Finalize(); return 0; }

MPI Messages • DATA data to be sent • ENVELOPE – information to route the data.

Description of MPI_Send (MPI_Recv)

Some useful remarks • Source= MPI_ANY_SOURCE means that any source is acceptable • Tags specified by sender and receiver must match, or MPI_ANY_TAG : any tag is acceptable • Communicator must be the same for send/receive. Usually : MPI_COMM_WORLD

POINT-TO-POINT COMMUNICATION • Transmission of a message between one pair of processes • Programmer can choose mode of transmission • Programmer can choose mode of transmission

Can be chosen by programmer …or let the system decide Synchronous mode Ready mode Buffered mode Standard mode MODE of TRANSMISSION

BLOCKING /NON-BLOCKING COMMUNICATIONS

BLOCKING STANDARD SEND Date transfer from source complete MPI_SEND Size>threshold Task waits S R wait Transfer begins when MPI_RECV has been posted MPI_RECV Task continues when data transfer to buffer is complete

NON BLOCKING STANDARD SEND Date transfer from source complete MPI_ISEND MPI_WAIT Size>threshold Task waits S R wait Transfer begins when MPI_IRECV has been posted MPI_IRECV MPI_WAIT No interruption if wait is late enough

BLOCKING STANDARD SEND MPI_SEND Size<=threshold Data transfer from source complete S R Transfer to buffer on receiver MPI_RECV Task continues when data transfer to user’sbuffer is complete

NON BLOCKING STANDARD SEND Date transfer from source complete MPI_ISEND MPI_WAIT Size<=threshold No delay even though message is not yet in buffer on R S R Transfer to buffer can be avoided if MPI_IRECV posted early enough MPI_IRECV MPI_WAIT No delay if wait is late enough

BLOCKING COMMUNICATION

NON-BLOCKING

Deadlock program (cont) if ( irank.EQ.0 ) then idest = 1 isrc = 1 isend_tag = ITAG_A irecv_tag = ITAG_B else if ( irank.EQ.1 ) then idest = 0 isrc = 0 isend_tag = ITAG_B irecv_tag = ITAG_A end if C ----------------------------------------------------------------C send and receive messagesC ------------------------------------------------------------- print *, " Task ", irank, " has sent the message" call MPI_Send ( rmessage1, MSGLEN, MPI_REAL, idest, isend_tag, . MPI_COMM_WORLD, ierr ) call MPI_Recv ( rmessage2, MSGLEN, MPI_REAL, isrc, irecv_tag, . MPI_COMM_WORLD, istatus, ierr ) print *, " Task ", irank, " has received the message" call MPI_Finalize (ierr) end

DEADLOCK example MPI_RECV MPI_SEND A B MPI_SEND MPI_RECV

Deadlock example • SP2 implementation:No Receive has been posted yet,so both processes block • Solutions Different ordering Non-blocking calls MPI_Sendrecv

Determining Information about Messages • Wait • Test • Probe

MPI_WAIT • Useful for both sender and receiver of non-blocking communications • Receiving process blocks until message is received, under programmer control • Sending process blocks until send operation completes, at which time the message buffer is available for re-use

MPI_WAIT compute transmit S R MPI_WAIT

MPI_TEST MPI_TEST compute transmit S MPI_Isend R

MPI_TEST • Used for both sender and receiver of non-blocking communication • Non-blocking call • Receiver checks to see if a specific sender has sent a message that is waiting to be delivered ... messages from all other senders are ignored

MPI_TEST (cont.) Sender can find out if the message-buffer can be re-used ... have to wait until operation is complete before doing so

MPI_PROBE • Receiver is notified when messages from potentially any sender arrive and are ready to be processed. • Blocking call

Programming recommendations • Blocking calls are needed when: • Tasks must synchronize • MPI_Wait immediately follows communication call

Collective Communication • Establish a communication pattern within a group of nodes. • All processes in the group call the communication routine, with matching arguments. • Collective routine calls can return when their participation in the collective communication is complete.

Parallel Programming on the SGI Origin2000