550 likes | 738 Views
Mpirun command (script). Primary job launcher for MPI implementation Can run an application on the local host only or distribute it to run on any number of hosts specified by the user
E N D
Mpirun command (script) • Primary job launcher for MPI implementation • Can run an application on the local host only or distribute it to run on any number of hosts specified by the user • Since it is not a part of MPI standard, mpirun is implementation dependent and each implementation’s mpirun differs both in syntax and functionality
Synopsis: SGI MPT for IRIX • Global options apply to all MPI executables on all specified hosts and must precede local options in the entry • Entry describes a host on which to run a program and local options for that host mpirun [global options] entry [:entry …]
Global options (examples) -d pathname specifies working directory for all hosts -f filename specifies text file that contains mpirun arguments -h displays a list of options -v displays comments on what mpirun is doing when launching MPI application
Entry operand Each entry has the following components: • One or more host names (not needed if run on local host) • Number of processes to start on each host • Name of executable program • Arguments to the executable program (optional)
Entry format & local options Format: HostList LocalOptions Program Arguments Local options: -f filename same as in global options -np nodes number of processes
Examples • 5 instances of program ‘test’ on local host mpirun -np 5 test • Same, but use /tmp/mydir as w/d: mpirun -d /tmp/mydir -np 5 test • Different # of instances, different programs, different hosts ( -np can be omitted): mpirun host1 5 test1 : host2 7 test2
Synopsis: DEC UNIX MPICH mpirun [mpirun_options] <prog> [options] • Options for mpirun (mpirun_options) must come before the program and must be spelled out completely • Unrecognized options are silently ignored • Options that are not directed to mpich (don’t begin with -mpi or -p4) are passed through to all processes of application
Mpirun_options (examples) -p4wd pathspecifies working directory, should come after <prog> -machinefile filenametakes the list of machines to run on from file -np nodes specifies # of processors -nolocal does not run on local machine -h displays help -v displays comments -t tests – does not actually run
Examples • Program ‘test’ on 8 machine cluster mpirun -np 8 -machinefile mf test • Same, but use /tmp as working directory: mpirun -np 8 -machinefile mf test -p4wd /tmp • Exclude local host mpirun -nolocal -np 8 -machinefile mf test
Synopsis: MPI/PRO for NT mpirun [mpirun_options] <prog> [options] • The options for mpirun (mpirun_options) must come before the program you want to run
Mpirun_options (examples) -r registers your password -d nameselects other than default domain -np nodes specifies # of processors -wd path assigns working directory -mach_file filename takes the list of possible machines to run on from file -help displays help -version displays MPI/Pro version
Examples • Program ‘test’ on 8 machine cluster mpirun –d IG -np 8 -mach_file mf test • Same, but use \tmp as working directory: mpirun -d IG -wd \tmp -np 8 -mach_file mf test • Use default machines file in MPI_HOME directory or in directory where ‘test’ is mpirun -d IG -np 8 test
Exercise 1 • Problem: detect active nodes • Task: write mpi program and myp program that opens each node and writes node id to the screen • Expected output (3 nodes): reports node =0 reports node =1 reports node =2
Diagram Node 0 reports 0 Local Launch Node 1 reports 1 Node 2 reports 2
MPI solution PROGRAM mpi_exc1 INCLUDE 'mpif.h' INTEGER iamnode, ier CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) WRITE(*,*) 'reports node = ', iamnode CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END
MYP solution PROGRAM myp_exc1 INCLUDE 'mypf.h' INTEGER iamnode, ier CALL myp_open(iamnode,ier) WRITE(*,*) 'reports node = ', iamnode CALL myp_close(ier) END
Exercise 2 • Problem: send/receive data • Task: write mpi/myp program that sends node id to the neighbor, receives node id, and prints node id and data received • Expected output (3 nodes): node =0 received 2 node =1 received 0 node =2 received 1
Diagram Node 0 sends 0 reports 2 sends 2 Local Node 1 Launch reports 0 sends 1 Node 2 reports 1
Table To = MOD(iamnode+1,nodes) From= MOD(iamnode+nodes-1,nodes)
MPI solution PROGRAM mpi_exc2 INCLUDE 'mpif.h' INTEGER iamnode, ier, sbuf, rbuf, to, from INTEGER stag, rtag, nodes CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) CALL mpi_comm_size(mpi_comm_world,nodes,ier) sbuf = iamnode; icount = 1 to = MOD(iamnode+1,nodes); stag=1000+to CALL mpi_send(sbuf,icount, mpi_integer, to,stag, & mpi_comm_world, ier)
from= MOD(iamnode+nodes-1,nodes) rtag = 1000+iamnode CALL mpi_recv(rbuf, count, mpi_integer, from, rtag, & mpi_comm_world, status, ier) WRITE(*,*) ' node = ', iamnode, ‘received ‘, rbuf CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END
MYP solution PROGRAM myp_exc2 INCLUDE 'mypf.h' INTEGER iamnode, ier REAL x,temp CALL myp_open(iamnode,ier) incnode=1; x=iamnode; icount=1 CALL myptoken(x,temp,icount,iamnode,incnode,ier) WRITE(*,*) ' node = ', iamnode, ‘received ‘, INT(x) CALL myp_close(ier) END
Exercise 3 • Problem: add local and received data, send result to neighbor. First node (id=0) reports the final result • Task: write mpi/myp program so that each node receives neighbor’s id, adds with its own and sends to the neighbor • Expected output (3 nodes): node =0 result =3 node =1 partial result=1 node =2 partial result=3
Diagram Node 0 reports 3 sends 3 Launch Local Node 1 sends 1 Node 2
Table To = MOD(iamnode+1,nodes) From= MOD(iamnode+nodes-1,nodes)
MPI solution PROGRAM mpi_exc3 INCLUDE 'mpif.h‘ COMMON nodes INTEGER status (mpi_status_size) INTEGER iamnode, ier, sbuf, rbuf, to, from INTEGER stag, rtag CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) CALL mpi_comm_size(mpi_comm_world, nodes, ier) sbuf = iamnode; icount = 1; to = MOD(iamnode+1,nodes); stag=1000+to from= MOD(iamnode+nodes-1,nodes) rtag=1000 +iamnode
IF(iamnode.ne.1) THEN CALL mpi_recv(rbuf, icount, mpi_integer, from, & rtag, mpi_comm_world, status, ier) sbuf=sbuf+rbuf ENDIF IF(iamnode.ne.0) CALL mpi_send(sbuf,icount, & mpi_integer, to, stag,mpi_comm_world, ier) IF(iamnode.ne.0) WRITE(*,*) ' node = ', iamnode, & 'partial result=',sbuf IF(iamnode.eq.0) WRITE(*,*) ' node = ', iamnode, & 'result=',sbuf CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END
MYP solution PROGRAM myp_exc3 INCLUDE 'myp.h' INTEGER iamnode,ier REAL x,temp CALL pvmopen(iamnode,ier) icount=1; x=iamnode CALL pvmg1sum(x,temp,icount,iamnode,ier) IF(iamnode.ne.0) WRITE(*,*) ' node = ', iamnode, & 'partial result=',INT(temp) IF(iamnode.eq.0) WRITE(*,*) ' node = ', iamnode, & 'result=',INT(x) CALL pvmclose(ier) END
Exercise 4 • Problem: compute global power 2 sum • Task: write mpi/myp program sums ids of two adjacent nodes, sends result to node=power of 2, etc. Node 0 reports the result • Expected output (8 nodes): node =0 result=28 • Expected output (7 nodes): node =0 result=21
NODE 0 NODE 1 NODE 2 NODE 3 NODE 4 NODE 5 NODE 6 NODE 7 + + + + NODE 0 NODE 2 NODE 4 NODE 6 + + NODE 0 NODE 4 + NODE 0 Diagram (case nodes=2**m)
Table – step 1 (j=1) To = iamnode-2**(j-1) = iamnode -1 From = iamnode+2**(j-1) = iamnode+1
Table – step 2 (j=2) To = iamnode-2**(j-1) = iamnode -2 From = iamnode+2**(j-1) = iamnode+2
Table – step 3 (j=3) To = iamnode-2**(j-1) = iamnode -4 From = iamnode+2**(j-1) = iamnode+4
Diagram (case nodes2**m) NODE 0 NODE 1 NODE 2 NODE 3 NODE 4 NODE 5 NODE 6 + + + NODE 0 NODE 2 NODE 4 + + NODE 0 NODE 4 + NODE 0
Table – step 1 (j=1) To = iamnode-2**(j-1) = iamnode -1 From = iamnode+2**(j-1) = iamnode+1
Table – step 2 (j=2) To = iamnode-2**(j-1) = iamnode -2 From = iamnode+2**(j-1) = iamnode+2
Table – step 3 (j=3) To = iamnode-2**(j-1) = iamnode -4 From = iamnode+2**(j-1) = iamnode+4
MPI solution PROGRAM mpi_exc4 INCLUDE 'mpif.h' COMMON nodes INTEGER status (mpi_status_size) INTEGER iamnode,ier,sbuf,rbuf,to,from CALL mpi_init(ier) CALL mpi_comm_rank(mpi_comm_world, iamnode,ier) CALL mpi_comm_size(mpi_comm_world,nodes,ier) icount =1; sbuf=iamnode; ndim=LOG(FLOAT(nodes))/LOG(2.) IF(nodes.ne.2**ndim) ndim=ndim+1 node = iamnode; itag=1000
DO j=1,ndim IF (MOD(node,2) .eq. 0) THEN from = iamnode+2**(j-1) IF(from.le.(nodes-1)) THEN CALL mpi_recv(rbuf, icount, mpi_integer, & from, itag, mpi_comm_world, status, ier) sbuf = sbuf + rbuf ENDIF ELSE to = iamnode-2**(j-1) CALL mpi_send(sbuf,icount, mpi_integer, & to, itag, mpi_comm_world, ier) EXIT ENDIF node = node/2 ENDDO
IF(iamnode.ne.0) WRITE(*,*) ' node = ', & iamnode, 'partial result=',sbuf IF(iamnode.eq.0) WRITE(*,*) ' node = ', & iamnode, 'result=',sbuf CALL mpi_barrier(mpi_comm_world,ier) CALL mpi_finalize(ier) END
MYP solution PROGRAM myp_exc4 INCLUDE 'myp.h' INTEGER iamnode,ier REAL x,temp CALL myp_open(iamnode,ier) icount=1; x=iamnode CALL myp_g2sum(x,temp,icount,iamnode,ier) IF(iamnode.ne.0) WRITE(*,*) ' node = ', iamnode, & 'partial result=',INT(x) IF(iamnode.eq.0) WRITE(*,*) ' node = ', iamnode, & 'result=',INT(x) CALL myp_close(ier) END
Local host & cluster setup • Local host: • Launch directory: contains job description parameter files read by executable(s) at start • Working directory: contains input data files; run-time message files (log files) and job’s final output • Each cluster node: • Data directory: for temporary data files, scratch files, etc. (say, ‘/local/data)
Diagram (3 nodes) Node 0: /local/data Local host: /usr/… Node 1: /local/data /home/… /disk1/myjobs/launch_job1 Node 2: /local/data /disk2/myjobs/job1 Logs & result
Data distribution noffx offsets 1 shots 1 Seismic cube 1 nshotx xtsalt.dir nt samples
Data direct access file Record # 1 2 … noffx noffx+1 … 2*noffx … nshotx*noffx 1 2 … noffx 1 2 … noffx … 1 2 … noffx … shot 1 shot 2 shot nshotx Record length =mbytes*nt
Shot-wise distribution (3 nodes) Node 0 File: data0.dir nshotx=15 noffx=5 nshots_per_node=5 Record # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 25 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 shot 1 shot 4 shot 7 shot 10 shot 13
Node 1 File: data1.dir nshotx=15 noffx=5 nshots_per_node=5 Record # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 25 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 shot 2 shot 5 shot 8 shot 11 shot 14
Node 2 File: data2.dir nshotx=15 noffx=5 nshots_per_node=5 Record # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 25 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 shot 3 shot 6 shot 9 shot 12 shot 15
Data distribution block c Data distribution block c nshotx total number of shots c noffx total number of offsets c nodes total number of nodes c number of shots to be processed by each node nshots_per_node=nshotx/nodes + 1 c initialize counter for shots num_sp= 0 c start main loop over shots do jspx = 1, nshotx c increment counter for shots num_sp=num_sp+1