1.13k likes | 1.3k Views
MPI Parallel Programming. Information Technology Services The University of Hong Kong. By HPC/Grid Team, Email: hpc@cc.hku.hk. Must Know Before Programming…. HKU HPC Facilities hpcpower2 : 64-bit Linux cluster consists of 24 nodes
E N D
MPI Parallel Programming Information Technology Services The University of Hong Kong By HPC/Grid Team, Email: hpc@cc.hku.hk
Must Know Before Programming… • HKU HPC Facilities • hpcpower2: 64-bit Linux cluster consists of 24 nodes • each node has TWO 64-bit quad-core Intel Xeon CPUs running at 3GHz • hpcpower: 32-bit Linux cluster consists of 178 nodes • 128 nodes of dual 2.8 GHz Xeon processors, and • 50 nodes of dual 3.06 GHz Xeon processors
How to get an account • Download the form at http://www.hku.hk/cc/home/services/forms.htm CF-137e : High Performance Computing Cluster (hpcpower) Account Application (for Non-TOSI Staff/Students) • Return the application form to CC office.
How to login HPCPOWER • From any PC within HKU campus network: • ssh hpcpower (bundled in Linux) • Download PuTTY to use SSH at Windows http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html • PC outside HKU campus network: Must use HKUVPN to get a HKU campus network IP
File Transfer • From/To any PC within HKU campus network: • scp/sftp hpcpower (bundled in Linux) • Download WINSCP to use SSH at Windows http://winscp.net/eng/download.php • PC outside HKU campus network: Must use HKUVPN to get a HKU campus network IP
Program Editing • You can use the command vi, emacs or pico to edit programs. Please refer to the UNIX user's guide for detail. http://www.hku.hk/cc/handbook/unix/unix_toc.html • Microsoft Windows users: do not use a standard Microsoft Windows editor such as Notepad/Wordpad to edit files that will be used on the Linux system. • To convert Windows file to UNIX file, enter: dos2unix file.txt
Resource Management System • After logging into to the cluster, the user is on the master node. When a program is run, it is also immediately run on the master. This is the "interactive mode", which is convenient for running simple commands like ls, vi, etc. or for editing/compiling a program. • Any interactive program runs on the master node will be killed without further notice.
PBS (Portable Batch System) • Long computing jobs should be submitted through the batch system. • The submitted job will be in a queue waiting for its turn, then will be sent to one or more compute node(s), which the job will have dedicated access to until it finishes. • With PBS, the job will run faster and the cluster will be more efficiently utilized.
Sample PBS job script • Sample PBS job script (pbs.cmd) #!/bin/sh #PBS -N test #PBS -m ea #PBS -q qdev #PBS -l walltime=02:00:00 #PBS -l nodes=1:ppn=2 ### Define number of processors used NP=`wc -l < $PBS_NODEFILE` cd $PBS_O_WORKDIR mpirun –np $NP ./a.out
Sample PBS job script • A line beginning with # is a comment; • A line beginning with #PBS is a PBS directive; • This submit script pbs.cmd specifies the name of the job (test), which queue to use (qdev), that it needs both processors on a single node (nodes=1:ppn=2), that it will run for at most 2 hours (walltime), and that TORQUE should email to user when the job exits or aborts (ea). Additionally, the user specifies where and what to execute (./a.out).
Submit a PBS job $ qsub pbs.cmd 2234.hpcpower • PBS returns a job identifier of the form jobid.hpcpower where jobid (2234) is an integer number assigned by PBS. • The job identifier is needed for any actions involving the job, such as checking job status or deleting the job.
qsub options • The resource requested on command line has a high preference than the directive line in the script file. • e.g. submit job by command qsub -l nodes=2:ppn=8 pbs.cmd , the job will run on 2 compute nodes with 8 processors each instead of what stated in the script file pbs.cmd.
$ qstat –a or $ qa Job information provided Username: Job owner NDS: Number of nodes requested Req’d Time: Requested amount of wallclock time Elap Time: Elapsed time in the current job state S: Job state (E-Exit; R-Running; Q-Queuing) List all Submitted jobs status
Delete a Job $ qdel 2234 where 2234 is the job id.
MPI (Message Passing Interface) • Is a communication protocol for parallel programming • Is a library of functions but not a language • MPI software is available for free • MPICH and Open MPI is most common free versions • Many vendors have their own optimized version of MPI - IBM, Sun, SGI, Myrinet …..
What is MPI Process A process is a set of executable instructions (program) which runs on a processor. For maximum performance, each CPU (or core in a multicore machine) will be assigned just a single process. Processes P0 P1 P2 P3 Message Passing Interface Message Passing The method by which data from one processor's memory is copied to the memory of another processor. Communication Network
What is MPI • Same program running on many processors at the same time. • Single-Program-Multiple-Data (SPMD) style. if (my_rank == 0) then Call Master(....) else call Worker(....) endif
What is MPI • MPI codes can run on parallel machines with distributed memory and shared memory architecture. • High portability. • All variable are private to each process. • Process communicate via specific communication call.
MPI include file #include <mpi.h> void main (int argc, char *argv[]) { int np, rank, ierr; ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&np); /* Do Some Works */ ierr = MPI_Finalize(); } #include <mpi.h> void main (int argc, char *argv[]) { int np, rank, ierr; ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&np); /* Do Some Works */ ierr = MPI_Finalize(); } #include <mpi.h> void main (int argc, char *argv[]) { int np, rank, ierr; ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&np); /* Do Some Works */ ierr = MPI_Finalize(); } variable declarations #include <mpi.h> void main (int argc, char *argv[]) { int np, rank, ierr; ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&np); /* Do Some Works */ ierr = MPI_Finalize(); } #include <mpi.h> void main (int argc, char *argv[]) { int np, rank, ierr; ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&np); /* Do Some Works */ ierr = MPI_Finalize(); } Initialize MPI environment Do work and make message passing calls Terminate MPI Environment General MPI Program Structure
MPI Header File • C : #include “mpi.h” • FORTRAN : include ‘mpif.h’
MPI Function Format • All MPI functions have names that begin with the prefix MPI_ to advoid name collisions. • C function names are case sensitive but Fortran function names are not. C : ierror = MPI_Xxxx(parameter...) ; FORTRAN : call MPI_Xxxx(argument,..., ierror)
Initialising MPI environment • C : int MPI_Init (int *argc, char*** argv); • FORTRAN : MPI_Init(ierror) INTEGER ierror
Exit MPI environment • C : int MPI_Finialize() • FORTRAN : MPI_Finalize(ierror) INTEGER ierror
Communicator • Communicator is a collection of processes that can send messages to each other. • MPI_COMM_WORLD is predefined and consist of all the processes. 1 3 4 0 2 Communicator MPI_COMM_WORLD 5
Size • How many processes are contained within a communicator ? • C : MPI_Comm_size(MPI_Comm comm, *size) • FORTRAN : MPI_Comm_size(comm,size,ierror) INTEGER comm,size,ierror
Rank • How do we identify different processes ? • C: MPI_Comm_rank(MPI_Comm comm, *rank); • FORTRAN MPI_Comm_rank(comm,rank,ierror) INTEGER comm,rank,ierror • Rank values are range from 0 to N-1, where N is the number of processes in the communicator.
Program Helloworld INCLUDE ‘mpif.h’ INTEGER nproc,myrank,ierr CALL MPI_Init(ierr) CALL MPI_Comm_size(MPI_COMM_WORLD,nproc,ierr) CALL MPI_Comm_rank(MPI_COMM_WORLD,myrank,ierr) Print *, “Hello World! I’m process “,myrank,” of “,nproc CALL MPI_Finalize(ierr) C FORTRAN A First MPI Program : Helloworld #include <stdio.h> #include ‘mpi.h’ void main(int argc, char **argv) { int nproc,myrank,ierr; ierr=MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nproc); MPI_Comm_rank(MPI_COMM_WORLD,&myrank); printf (“Hello World! I’m process %d of %d\n”,myrank,nproc); ierr=MPI_Finalize(); }
Compile MPI in HPCPOWER • C pgcc -Mmpi program.c • FORTRAN (FIXED FORM) pgf90 -Mmpi program.f • FORTRAN (FREE FORM) pgf90 -Mmpi –Mfree program.f90
Execute MPI in HPCPOWER • Running in 4 processes in interactive node : mpirun –np 4 ./a.out Hello World! I’m process 0 of 4 Hello World! I’m process 1 of 4 Hello World! I’m process 2 of 4 Hello World! I’m process 3 of 4
What is in a message • An MPI message is an array of elements of a particular MPI datatype.
dest 1 3 4 0 source 2 5 Communicator Point-to-point communication A point-to-point communication always involves exactly two processes. One process acts as the sender and the other acts as the receiver.
Point-to-point communication • Communicate between 2 processes. • Source send message to destination process. • Communication takes place within a communicator. • Destination process is identified by its rank in the communicator.
Sending Messages • C : int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) • FORTRAN : MPI_Send( buf, count, datatype, dest, tag, comm, ierror) <type> BUF(*) INTEGER count,datatype,dest,tag,comm INTEGER ierror
Receiving Messages • C : int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) • FORTRAN : MPI_Recv( buf, count, datatype, source, tag, comm, status, ierror) <type> BUF(*) INTEGER count,datatype,dest,tag,comm INTEGER status(MPI_STATUS_SIZE),ierror
Messages • Messages = envelope + message body • Envelope • Source – The sending process • Destination – The receiving process • Communicator – specifies group of processes which both sourse and destination belong • Tag – used to classify messages • Message Body • Buffer – the message data • Datatype – type of message data • Count – number of items of type datatype in buffer
Tag • The tag is a number specified by programmer for each message. It is attached with the message being sent. • When the destination process receives the message, the process can check the tag and acts accordingly.
Message Matching Rules • Sender must specify a valid destination RANK. • Receiver must specify a valid source RANK. • Sender and receiver must use the same COMMUNICATOR. • TAG must match. • DATATYPE must match. • Receiver's buffer must be large enough.
Example : Greetings(Fortran) PROGRAM greetings include 'mpif.h' integer my_rank integer p integer source integer dest integer tag character*100 message character*10 digit_string integer size integer status(MPI_STATUS_SIZE) integer ierr call MPI_Init(ierr) call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierr) call MPI_Comm_size(MPI_COMM_WORLD, p, ierr)
Example : Greetings(Fortran) if (my_rank /= 0) then write(digit_string,FMT="(I3)") my_rank message ='Greetings from process ' // trim(digit_string) // '!' dest = 0 tag = 0 call MPI_Send(message, len_trim(message), & & MPI_CHARACTER, dest, tag, MPI_COMM_WORLD, ierr) else ! my_rank == 0 do source = 1, p-1 tag = 0 call MPI_Recv(message, len(message), & & MPI_CHARACTER, source, tag, & & MPI_COMM_WORLD, status, ierr) write(6,FMT="(A)") message enddo endif call MPI_Finalize(ierr) END PROGRAM greetings
Example : Greetings (C) #include <stdio.h> #include <string.h> #include "mpi.h" main(int argc, char* argv[]) { int my_rank; /* rank of process */ int p; /* number of processes */ int source; /* rank of sender */ int dest; /* rank of receiver */ int tag = 0; /* tag for messages */ char message[100]; /* storage for message */ MPI_Status status; /* return status for */ /* receive */ /* Start up MPI */ MPI_Init(&argc, &argv); /* Find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* Find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p);
Example : Greetings (C) if (my_rank != 0) { /* Create message */ sprintf(message, "Greetings from process %d!", my_rank); dest = 0; /* Use strlen+1 so that '\0' gets transmitted */ MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { /* my_rank == 0 */ for (source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } } /* Shut down MPI */ MPI_Finalize(); } /* main */
Example : Greetings • % mpirun –np 4 ./a.out Greetings from process 1 ! Greetings from process 2 ! Greetings from process 3 !
MPI_Sendrecv MPI_Sendrecv (*sendbuf,sendcount,sendtype,dest,sendtag, *recvbuf,recvcount,recvtype,source,recvtag, comm,*status)
Wildcard • MPI_Recv can use MPI_ANY_SOURCE and MPI_ANY_TAG as argument. • There is no wildcard for communicator. call MPI_Recv(message, 1, MPI_INTEGER, & MPI_ANY_SOURCE, MPI_ANY_TAG, & MPI_COMM_WORLD, MPI_Status , ierror) • MPI_Send cannot use wildcard.