230 likes | 372 Views
Hands on Special Jobs. Wu Wenjing IHEP – Beijing Grid tutorial for users Beijing, 25-26 Nov 2006. MPI job Job with Data. Outline. A example of MPI job. You can copy the resource code of the job from the tmp Directory [gilda07] /home/beijing03 > cp -r /tmp/hello/ .
E N D
Hands on Special Jobs Wu Wenjing IHEP – Beijing Grid tutorial for users Beijing, 25-26 Nov 2006
MPI job Job with Data Outline
You can copy the resource code of the job from the tmp Directory [gilda07] /home/beijing03 > cp -r /tmp/hello/ . [gilda07] /home/beijing03 > ls hello/ hello_mpi.c hello_mpi.jdl hello_mpi.o hello_mpi.sh hello_mpi.X makefile [gilda07] /home/beijing01 > cp -r /tmp/data . [gilda07] /home/beijing01 > ls data/ data.jdl WMS-text2.conf WMS-text.conf Good Luck! Have a try!
Hello_mpi.c(compiled to a executable program –hello_mpi.X) #include <stdio.h> #include <stdlib.h> #include "mpi.h" int main(int argc, char **argv) { int rank, size; char host_name[20]; An example of MPI job
MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); gethostname(host_name, 20); printf("I am processor: %d at %s\n",rank,host_name); MPI_Finalize(); return(0); } A example of MPI job(cont)
A MPI job includes usually three parts: • A C program or The executable program compiled by the C program • wrapper script that invokes the MPI applications by calling mpirun command. • The JDL file . A example of MPI job(cont)
Hello_mpi.sh #!/bin/sh -x # Binary to execute EXE=$1 CPU_NEEDED=$2 echo "***********************************************************************" echo "Running on: $HOSTNAME" echo "As: " `whoami` if [ -f "$PWD/.BrokerInfo" ] ; then TEST_LSF=`edg-brokerinfo getCE | cut -d/ -f2 | grep lsf` else TEST_LSF=`ps -ef | grep sbatchd | grep -v grep` fi A example of MPI job(cont)
if [ "x$TEST_LSF" = "x" ] ; then # prints the name of the file containing the nodes allocated for parallel execution echo "PBS Nodefile: $PBS_NODEFILE" # print the names of the nodes allocated for parallel execution cat $PBS_NODEFILE echo "*************************************" HOST_NODEFILE=$PBS_NODEFILE else # print the names of the nodes allocated for parallel execution echo "LSF Hosts: $LSB_HOSTS" # loops over the nodes allocated for parallel execution HOST_NODEFILE=`pwd`/lsf_nodefile.$$ for host in ${LSB_HOSTS} do host=`host $host | awk '{ print $1 } '` echo $host >> ${HOST_NODEFILE} done fi A example of MPI job(cont)
cat ${HOST_NODEFILE} echo "*************************************" # prints the working directory on the master node echo "Current dir: $PWD" echo "*************************************" for i in `cat $HOST_NODEFILE` ; do echo "Mirroring via SSH to $i" # creates the working directories on all the nodes allocated for parallel execution ssh $i mkdir -p `pwd` # copies the needed files on all the nodes allocated for parallel execution /usr/bin/scp -rp ./* $i:`pwd` # checks that all files are present on all the nodes allocated for parallel execution echo `pwd` ssh $i ls `pwd` A example of MPI job(cont)
#setsthe permissions of the files ssh $i chmod 755 `pwd`/$EXE ssh $i ls -alR `pwd` echo "@@@@@@@@@@@@@@@" done echo "***********************************************************************" echo "Executing $EXE with mpirun" chmod 755 $EXE /opt/mpich/bin/mpirun -np $CPU_NEEDED -machinefile $HOST_NODEFILE `pwd`/$EXE A example of MPI job(cont)
Hello_mpi.jdl Type = "Job"; JobType = "MPICH"; NodeNumber = 2; Executable = "hello_mpi.sh"; Arguments = "hello_mpi.X 4 "; StdOutput = "hello.out"; StdError = "hello.err"; InputSandbox = {"hello_mpi.sh","hello_mpi.X"}; OutputSandbox = {"hello.err","hello.out"}; Requirements = (other.GlueCEInfoLRMSType == "PBS") || (other.GlueCEInfoLRMSType == "LSF") A example of MPI job(cont)
[gilda07] /home/liuag/EuChinaGrid/hello > edg-job-submit --vo gilda hello_mpi.jd Selected Virtual Organisation name (from --vo option): gilda Connecting to host gilda05.ihep.ac.cn, port 7772 Logging to host gilda05.ihep.ac.cn, port 9002 ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ ******************************************************************************** How to submit a MPI job
[gilda07] /home/liuag/EuChinaGrid/hello > edg-job-status https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: grid-ce.bio.dist.unige.it:2119/jobmanager-lcgpbs-long reached on: Tue Nov 21 07:27:03 2006 ************************************************************* Get the status of the submitted job
[gilda07] /home/liuag/EuChinaGrid/hello > edg-job-get-output https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ Retrieving files from host: gilda05.ihep.ac.cn ( for https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ have been successfully retrieved and stored in the directory: /tmp/jobOutput/liuag_xxu2ktFg3ChY2K-KTnxrIQ ******************************************************************************** Get the output of the finished job
[gilda07] /home/liuag/EuChinaGrid/hello > more /tmp/jobOutput/liuag_xxu2ktFg3ChY 2K-KTnxrIQ/hello.out ......... *************************************************************** Executing hello_mpi.X with mpirun I am processor: 2 at grid-wn03.bio.dist.u I am processor: 3 at grid-wn03.bio.dist.u I am processor: 1 at grid-wn03.bio.dist.u I am processor: 0 at grid-wn03.bio.dist.u The output of the job
VirtualOrganisation = "gilda"; Executable = "/bin/echo"; Arguments = “Hello everyone"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out","std.err"}; DataCatalog = "http://lfc-gilda.ct.infn.it:8085"; InputData = {"lfn:/grid/gilda/provaK"}; DataAccessProtocol = {"gridftp","rfio","gsiftp"}; An example of the Job with data
VirtualOrganisation = "gilda"; Executable = "/bin/echo"; Arguments = "hello everyone"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out","std.err"}; DataCatalog = "http://lfc-gilda.ct.infn.it:8085"; InputData = {"lfn:/grid/gilda/scardaci/BoxData.txt"}; DataAccessProtocol = {"gridftp","rfio","gsiftp"}; An example of the Job with Date
As the LCG flavor RB can not accept the Job with Date,We need to appoint our job to a GLITE flavor RB.There are some differences by submitting the job. • prepare a configuration file to appoint the RB . submit the job # more WMS-text2.conf [ VirtualOrganisation = "gilda"; NSAddresses = "glite-rb2.ct.infn.it:7772"; LBAddresses = "glite-rb2.ct.infn.it:9000"; ]
submit the job with the command “glite-job-submit” and with the option “–config-vo” #glite-job-submit --config-vo WMS-text2.conf newdata.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb2.ct.infn.it, port 7772 Logging to host glite-rb2.ct.infn.it, port 9002 ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https://glite-rb2.ct.infn.it:9000/6H6MxCv4gDHQh6To0y32fA ********************************************************************************************* submit the job We are using a GLITE RB
#glite-job-status https://glite-rb2.ct.infn.it:9000/6H6MxCv4gDHQh6To0y32fA ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://glite-rb2.ct.infn.it:9000/6H6MxCv4gDHQh6To0y32fA Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: grid010.ct.infn.it:2119/jobmanager-lcgpbs-long Submitted: Tue Nov 21 18:04:06 2006 CST ************************************************************* query the status of the job
glite-job-output https://glite-rb2.ct.infn.it:9000/6H6MxCv4gDHQh6To0y32fA Retrieving files from host: glite-rb2.ct.infn.it ( for https://glite-rb2.ct.infn.it:9000/6H6MxCv4gDHQh6To0y32fA ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://glite-rb2.ct.infn.it:9000/6H6MxCv4gDHQh6To0y32fA have been successfully retrieved and stored in the directory: /tmp/liuag_6H6MxCv4gDHQh6To0y32fA ********************************************************************************* get the output
The output of the job: Output of the job more /tmp/liuag_6H6MxCv4gDHQh6To0y32fA/std.out hello everyone