Mpirun command (script)

Mpirun command (script) • Primary job launcher for MPI implementation • Can run an application on the local host only or distribute it to run on any number of hosts specified by the user • Since it is not a part of MPI standard, mpirun is implementation dependent and each implementation’s mpirun differs both in syntax and functionality

Synopsis: SGI MPT for IRIX • Global options apply to all MPI executables on all specified hosts and must precede local options in the entry • Entry describes a host on which to run a program and local options for that host mpirun [global options] entry [:entry …]

Global options (examples) -d pathname specifies working directory for all hosts -f filename specifies text file that contains mpirun arguments -h displays a list of options -v displays comments on what mpirun is doing when launching MPI application

Entry operand Each entry has the following components: • One or more host names (not needed if run on local host) • Number of processes to start on each host • Name of executable program • Arguments to the executable program (optional)

Entry format & local options Format: HostList LocalOptions Program Arguments Local options: -f filename same as in global options -np nodes number of processes

Examples • 5 instances of program ‘test’ on local host mpirun -np 5 test • Same, but use /tmp/mydir as w/d: mpirun -d /tmp/mydir -np 5 test • Different # of instances, different programs, different hosts ( -np can be omitted): mpirun host1 5 test1 : host2 7 test2

Synopsis: DEC UNIX MPICH mpirun [mpirun_options] <prog> [options] • Options for mpirun (mpirun_options) must come before the program and must be spelled out completely • Unrecognized options are silently ignored • Options that are not directed to mpich (don’t begin with -mpi or -p4) are passed through to all processes of application

Mpirun_options (examples) -p4wd pathspecifies working directory, should come after <prog> -machinefile filenametakes the list of machines to run on from file -np nodes specifies # of processors -nolocal does not run on local machine -h displays help -v displays comments -t tests – does not actually run

Examples • Program ‘test’ on 8 machine cluster mpirun -np 8 -machinefile mf test • Same, but use /tmp as working directory: mpirun -np 8 -machinefile mf test -p4wd /tmp • Exclude local host mpirun -nolocal -np 8 -machinefile mf test

Synopsis: MPI/PRO for NT mpirun [mpirun_options] <prog> [options] • The options for mpirun (mpirun_options) must come before the program you want to run

Mpirun_options (examples) -r registers your password -d nameselects other than default domain -np nodes specifies # of processors -wd path assigns working directory -mach_file filename takes the list of possible machines to run on from file -help displays help -version displays MPI/Pro version

Examples • Program ‘test’ on 8 machine cluster mpirun –d IG -np 8 -mach_file mf test • Same, but use \tmp as working directory: mpirun -d IG -wd \tmp -np 8 -mach_file mf test • Use default machines file in MPI_HOME directory or in directory where ‘test’ is mpirun -d IG -np 8 test

Exercise 1 • Problem: detect active nodes • Task: write mpi program and myp program that opens each node and writes node id to the screen • Expected output (3 nodes): reports node =0 reports node =1 reports node =2

Diagram Node 0 reports 0 Local Launch Node 1 reports 1 Node 2 reports 2

MPI solution PROGRAM mpi_exc1 INCLUDE 'mpif.h' INTEGER iamnode, ier CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) WRITE(*,*) 'reports node = ', iamnode CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END

MYP solution PROGRAM myp_exc1 INCLUDE 'mypf.h' INTEGER iamnode, ier CALL myp_open(iamnode,ier) WRITE(*,*) 'reports node = ', iamnode CALL myp_close(ier) END

Exercise 2 • Problem: send/receive data • Task: write mpi/myp program that sends node id to the neighbor, receives node id, and prints node id and data received • Expected output (3 nodes): node =0 received 2 node =1 received 0 node =2 received 1

Diagram Node 0 sends 0 reports 2 sends 2 Local Node 1 Launch reports 0 sends 1 Node 2 reports 1

Table To = MOD(iamnode+1,nodes) From= MOD(iamnode+nodes-1,nodes)

MPI solution PROGRAM mpi_exc2 INCLUDE 'mpif.h' INTEGER iamnode, ier, sbuf, rbuf, to, from INTEGER stag, rtag, nodes CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) CALL mpi_comm_size(mpi_comm_world,nodes,ier) sbuf = iamnode; icount = 1 to = MOD(iamnode+1,nodes); stag=1000+to CALL mpi_send(sbuf,icount, mpi_integer, to,stag, & mpi_comm_world, ier)

from= MOD(iamnode+nodes-1,nodes) rtag = 1000+iamnode CALL mpi_recv(rbuf, count, mpi_integer, from, rtag, & mpi_comm_world, status, ier) WRITE(*,*) ' node = ', iamnode, ‘received ‘, rbuf CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END

MYP solution PROGRAM myp_exc2 INCLUDE 'mypf.h' INTEGER iamnode, ier REAL x,temp CALL myp_open(iamnode,ier) incnode=1; x=iamnode; icount=1 CALL myptoken(x,temp,icount,iamnode,incnode,ier) WRITE(*,*) ' node = ', iamnode, ‘received ‘, INT(x) CALL myp_close(ier) END

Exercise 3 • Problem: add local and received data, send result to neighbor. First node (id=0) reports the final result • Task: write mpi/myp program so that each node receives neighbor’s id, adds with its own and sends to the neighbor • Expected output (3 nodes): node =0 result =3 node =1 partial result=1 node =2 partial result=3

Diagram Node 0 reports 3 sends 3 Launch Local Node 1 sends 1 Node 2

Table To = MOD(iamnode+1,nodes) From= MOD(iamnode+nodes-1,nodes)

MPI solution PROGRAM mpi_exc3 INCLUDE 'mpif.h‘ COMMON nodes INTEGER status (mpi_status_size) INTEGER iamnode, ier, sbuf, rbuf, to, from INTEGER stag, rtag CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) CALL mpi_comm_size(mpi_comm_world, nodes, ier) sbuf = iamnode; icount = 1; to = MOD(iamnode+1,nodes); stag=1000+to from= MOD(iamnode+nodes-1,nodes) rtag=1000 +iamnode

IF(iamnode.ne.1) THEN CALL mpi_recv(rbuf, icount, mpi_integer, from, & rtag, mpi_comm_world, status, ier) sbuf=sbuf+rbuf ENDIF IF(iamnode.ne.0) CALL mpi_send(sbuf,icount, & mpi_integer, to, stag,mpi_comm_world, ier) IF(iamnode.ne.0) WRITE(*,*) ' node = ', iamnode, & 'partial result=',sbuf IF(iamnode.eq.0) WRITE(*,*) ' node = ', iamnode, & 'result=',sbuf CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END

MYP solution PROGRAM myp_exc3 INCLUDE 'myp.h' INTEGER iamnode,ier REAL x,temp CALL pvmopen(iamnode,ier) icount=1; x=iamnode CALL pvmg1sum(x,temp,icount,iamnode,ier) IF(iamnode.ne.0) WRITE(*,*) ' node = ', iamnode, & 'partial result=',INT(temp) IF(iamnode.eq.0) WRITE(*,*) ' node = ', iamnode, & 'result=',INT(x) CALL pvmclose(ier) END

Exercise 4 • Problem: compute global power 2 sum • Task: write mpi/myp program sums ids of two adjacent nodes, sends result to node=power of 2, etc. Node 0 reports the result • Expected output (8 nodes): node =0 result=28 • Expected output (7 nodes): node =0 result=21

NODE 0 NODE 1 NODE 2 NODE 3 NODE 4 NODE 5 NODE 6 NODE 7 + + + + NODE 0 NODE 2 NODE 4 NODE 6 + + NODE 0 NODE 4 + NODE 0 Diagram (case nodes=2**m)

Table – step 1 (j=1) To = iamnode-2**(j-1) = iamnode -1 From = iamnode+2**(j-1) = iamnode+1

Diagram (case nodes2**m) NODE 0 NODE 1 NODE 2 NODE 3 NODE 4 NODE 5 NODE 6 + + + NODE 0 NODE 2 NODE 4 + + NODE 0 NODE 4 + NODE 0

MPI solution PROGRAM mpi_exc4 INCLUDE 'mpif.h' COMMON nodes INTEGER status (mpi_status_size) INTEGER iamnode,ier,sbuf,rbuf,to,from CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) CALL mpi_comm_size(mpi_comm_world,nodes,ier) icount =1; sbuf=iamnode; ndim=LOG(FLOAT(nodes))/LOG(2.) IF(nodes.ne.2**ndim) ndim=ndim+1 node = iamnode; itag=1000

DO j=1,ndim IF (MOD(node,2) .eq. 0) THEN from = iamnode+2**(j-1) IF(from.le.(nodes-1)) THEN CALL mpi_recv(rbuf, icount, mpi_integer, & from, itag, mpi_comm_world, status, ier) sbuf = sbuf + rbuf ENDIF ELSE to = iamnode-2**(j-1) CALL mpi_send(sbuf,icount, mpi_integer, & to, itag, mpi_comm_world, ier) EXIT ENDIF node = node/2 ENDDO

IF(iamnode.ne.0) WRITE(*,*) ' node = ', & iamnode, 'partial result=',sbuf IF(iamnode.eq.0) WRITE(*,*) ' node = ', & iamnode, 'result=',sbuf CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END

MYP solution PROGRAM myp_exc4 INCLUDE 'myp.h' INTEGER iamnode,ier REAL x,temp CALL myp_open(iamnode,ier) icount=1; x=iamnode CALL myp_g2sum(x,temp,icount,iamnode,ier) IF(iamnode.ne.0) WRITE(*,*) ' node = ', iamnode, & 'partial result=',INT(x) IF(iamnode.eq.0) WRITE(*,*) ' node = ', iamnode, & 'result=',INT(x) CALL myp_close(ier) END

Local host & cluster setup • Local host: • Launch directory: contains job description parameter files read by executable(s) at start • Working directory: contains input data files; run-time message files (log files) and job’s final output • Each cluster node: • Data directory: for temporary data files, scratch files, etc. (say, ‘/local/data)

Diagram (3 nodes) Node 0: /local/data Local host: /usr/… Node 1: /local/data /home/… /disk1/myjobs/launch_job1 Node 2: /local/data /disk2/myjobs/job1 Logs & result

Data distribution noffx offsets 1 shots 1 Seismic cube 1 nshotx xtsalt.dir nt samples

Data direct access file Record # 1 2 … noffx noffx+1 … 2*noffx … nshotx*noffx 1 2 … noffx 1 2 … noffx … 1 2 … noffx … shot 1 shot 2 shot nshotx Record length =mbytes*nt

Shot-wise distribution (3 nodes) Node 0 File: data0.dir nshotx=15 noffx=5 nshots_per_node=5 Record # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 25 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 shot 1 shot 4 shot 7 shot 10 shot 13

Node 1 File: data1.dir nshotx=15 noffx=5 nshots_per_node=5 Record # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 25 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 shot 2 shot 5 shot 8 shot 11 shot 14

Node 2 File: data2.dir nshotx=15 noffx=5 nshots_per_node=5 Record # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 25 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 shot 3 shot 6 shot 9 shot 12 shot 15

Data distribution block c Data distribution block c nshotx total number of shots c noffx total number of offsets c nodes total number of nodes c number of shots to be processed by each node nshots_per_node=nshotx/nodes + 1 c initialize counter for shots num_sp= 0 c start main loop over shots do jspx = 1, nshotx c increment counter for shots num_sp=num_sp+1

Mpirun command (script)