960 likes | 1.16k Views
Parallel Programming on the SGI Origin2000. Taub Computer Center Technion. Moshe Goldberg, mgold@tx.technion.ac.il. With thanks to Igor Zacharov / Benoit Marchand, SGI. Mar 2004 (v1.2). Parallel Programming on the SGI Origin2000. Parallelization Concepts SGI Computer Design
E N D
Parallel Programming on the SGI Origin2000 Taub Computer Center Technion Moshe Goldberg, mgold@tx.technion.ac.il With thanks to Igor Zacharov / Benoit Marchand, SGI Mar 2004 (v1.2)
Parallel Programming on the SGI Origin2000 • Parallelization Concepts • SGI Computer Design • Efficient Scalar Design • Parallel Programming -OpenMP • Parallel Programming- MPI
Parallel classification • Parallel architectures Shared Memory / Distributed Memory • Programming paradigms Data parallel / Message passing
Shared Memory • Each processor can access any part of the memory • Access times are uniform (in principle) • Easier to program (no explicit message passing) • Bottleneck when several tasks access same location
Distributed Memory • Processor can only access local memory • Access times depend on location • Processors must communicate via explicit message passing
Distributed Memory Processor Memory Processor Memory Interconnection network
Message Passing Programming • Separate program on each processor • Local Memory • Control over distribution and transfer of data • Additional complexity of debugging due to communications
Performance issues • Concurrency – ability to perform actions simultaneously • Scalability – performance is not impaired by increasing number of processors • Locality – high ration of local memory accesses/remote memory accesses (or low communication)
SP2 Benchmark • Goal : Checking performance of real world applications on the SP2 • Execution time (seconds):CPU time for applications • Speedup Execution time for 1 processor = ------------------------------------ Execution time for p processors
WHAT is MPI? • A message- passing library specification • Extended message-passing model • Not specific to implementation or computer
BASICS of MPI PROGRAMMING • MPI is a message-passing library • Assumes : a distributed memory architecture • Includes : routines for performing communication (exchange of data and synchronization) among the processors.
Message Passing • Data transfer + synchronization • Synchronization : the act of bringing one or more processes to known points in their execution • Distributed memory: memory split up into segments, each may be accessed by only one process.
Message Passing May I send? yes Send data
MPI STANDARD • Standard by consensus, designed in an open forum • Introduced by the MPI FORUM in May 1994, updated in June 1995. • MPI-2 (1998) produces extensions to the MPI standard
Why use MPI ? • Standardization • Portability • Performance • Richness • Designed to enable libraries
Writing an MPI Program • If there is a serial version , make sure it is debugged • If not, try to write a serial version first • When debugging in parallel , start with a few nodes first.
program hello include ’mpif.h’ status(MPI_STATUS_SIZE) character*12 messagecall MPI_INIT(ierror) call MPI_COMM_SIZE(MPI_COMM_WORLD, size,ierror) call MPI_COMM_RANK(MPI_COMM_WORLD, rank,ierror)tag = 100 if(rank .eq. 0) then message = 'Hello, world' do i=1, size-1call MPI_SEND(message, 12, MPI_CHARACTER , i, & tag,MPI_COMM_WORLD,ierror) enddo else call MPI_RECV(message, 12, MPI_CHARACTER, 0,tag,MPI_COMM_WORLD, status, ierror) endif print*, 'node', rank, ':', message call MPI_FINALIZE(ierror) end
int main( int argc, char *argv[]){ int tag=100; int rank,size,i; MPI_Status * statuschar message[12]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); strcpy(message,"Hello,world"); if (rank==0) for (i=1;i<size;i++){ MPI_Send(message,12,MPI_CHAR,i,tag,MPI_COMM_WORLD); } } elseMPI_Recv(message,12,MPI_CHAR,0,tag,MPI_COMM_WORLD,&status); printf("node %d : %s \n",rank,message); MPI_Finalize(); return 0; }
MPI Messages • DATA data to be sent • ENVELOPE – information to route the data.
Some useful remarks • Source= MPI_ANY_SOURCE means that any source is acceptable • Tags specified by sender and receiver must match, or MPI_ANY_TAG : any tag is acceptable • Communicator must be the same for send/receive. Usually : MPI_COMM_WORLD
POINT-TO-POINT COMMUNICATION • Transmission of a message between one pair of processes • Programmer can choose mode of transmission • Programmer can choose mode of transmission
Can be chosen by programmer …or let the system decide Synchronous mode Ready mode Buffered mode Standard mode MODE of TRANSMISSION
BLOCKING STANDARD SEND Date transfer from source complete MPI_SEND Size>threshold Task waits S R wait Transfer begins when MPI_RECV has been posted MPI_RECV Task continues when data transfer to buffer is complete
NON BLOCKING STANDARD SEND Date transfer from source complete MPI_ISEND MPI_WAIT Size>threshold Task waits S R wait Transfer begins when MPI_IRECV has been posted MPI_IRECV MPI_WAIT No interruption if wait is late enough
BLOCKING STANDARD SEND MPI_SEND Size<=threshold Data transfer from source complete S R Transfer to buffer on receiver MPI_RECV Task continues when data transfer to user’sbuffer is complete
NON BLOCKING STANDARD SEND Date transfer from source complete MPI_ISEND MPI_WAIT Size<=threshold No delay even though message is not yet in buffer on R S R Transfer to buffer can be avoided if MPI_IRECV posted early enough MPI_IRECV MPI_WAIT No delay if wait is late enough
Deadlock program (cont) if ( irank.EQ.0 ) then idest = 1 isrc = 1 isend_tag = ITAG_A irecv_tag = ITAG_B else if ( irank.EQ.1 ) then idest = 0 isrc = 0 isend_tag = ITAG_B irecv_tag = ITAG_A end if C ----------------------------------------------------------------C send and receive messagesC ------------------------------------------------------------- print *, " Task ", irank, " has sent the message" call MPI_Send ( rmessage1, MSGLEN, MPI_REAL, idest, isend_tag, . MPI_COMM_WORLD, ierr ) call MPI_Recv ( rmessage2, MSGLEN, MPI_REAL, isrc, irecv_tag, . MPI_COMM_WORLD, istatus, ierr ) print *, " Task ", irank, " has received the message" call MPI_Finalize (ierr) end
DEADLOCK example MPI_RECV MPI_SEND A B MPI_SEND MPI_RECV
Deadlock example • SP2 implementation:No Receive has been posted yet,so both processes block • Solutions Different ordering Non-blocking calls MPI_Sendrecv
Determining Information about Messages • Wait • Test • Probe
MPI_WAIT • Useful for both sender and receiver of non-blocking communications • Receiving process blocks until message is received, under programmer control • Sending process blocks until send operation completes, at which time the message buffer is available for re-use
MPI_WAIT compute transmit S R MPI_WAIT
MPI_TEST MPI_TEST compute transmit S MPI_Isend R
MPI_TEST • Used for both sender and receiver of non-blocking communication • Non-blocking call • Receiver checks to see if a specific sender has sent a message that is waiting to be delivered ... messages from all other senders are ignored
MPI_TEST (cont.) Sender can find out if the message-buffer can be re-used ... have to wait until operation is complete before doing so
MPI_PROBE • Receiver is notified when messages from potentially any sender arrive and are ready to be processed. • Blocking call
Programming recommendations • Blocking calls are needed when: • Tasks must synchronize • MPI_Wait immediately follows communication call
Collective Communication • Establish a communication pattern within a group of nodes. • All processes in the group call the communication routine, with matching arguments. • Collective routine calls can return when their participation in the collective communication is complete.