250 likes | 380 Views
Special Jobs---MPI. Wu Wenjing IHEP – Beijing Grid tutorial for users Beijing, 25-16 Nov 2006. Overview MPI - What is a MPI job. - A example of MPI job - How to submit a MPI job. Overview of the Job with data. Outline. Two kinds of special job:
E N D
Special Jobs---MPI Wu Wenjing IHEP – Beijing Grid tutorial for users Beijing, 25-16 Nov 2006
Overview MPI - What is a MPI job. - A example of MPI job - How to submit a MPI job. Overview of the Job with data Outline
Two kinds of special job: • MPI job (Message Passing Interface) • Job with data requirement This kinds of special job
About MPI • Execution of parallel jobs is an essential issue for modern conceptions of informatics and application. • Most used library for parallel jobs support is (Message Passing Interface) MPI • At the state of the art, parallel jobs can run inside single Computing Elements (CE) only; • several projects are involved into studies concerning the possibility of executing parallel jos on Worker Nodes (WNs) belonging to differents CEs.
Requirements & Settings • In order to guarantee that MPI job can run, the following requirements MUST BE satisfied: • the MPICH software must be installed and placed in the PATH environment variable, on all the WNs of the CE. • [root@gilda02 root]# rpm -qa|grep mpi • mpiexec-0.77-3.sl3 • mpich-1.2.6-1.sl3.cl • All the WNs must be mutual trusted by ssh
Actually in GILDA MPI job includes usually three parts: • A C program or The executable program compiled by the C program. • wrapper script that invokes the MPI applications by calling mpirun command, • The JDL file . A example of MPI job(cont)
Hello_mpi.c(compiled to a executable program –hello_mpi.X) #include <stdio.h> #include <stdlib.h> #include "mpi.h" int main(int argc, char **argv) { int rank, size; char host_name[20]; A example of MPI job
MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); gethostname(host_name, 20); printf("I am processor: %d at %s\n",rank,host_name); MPI_Finalize(); return(0); } A example of MPI job(cont)
Hello_mpi.sh #!/bin/sh -x # Binary to execute EXE=$1 CPU_NEEDED=$2 echo "***********************************************************************" echo "Running on: $HOSTNAME" echo "As: " `whoami` if [ -f "$PWD/.BrokerInfo" ] ; then TEST_LSF=`edg-brokerinfo getCE | cut -d/ -f2 | grep lsf` else TEST_LSF=`ps -ef | grep sbatchd | grep -v grep` fi A example of MPI job(cont)
if [ "x$TEST_LSF" = "x" ] ; then # prints the name of the file containing the nodes allocated for parallel execution echo "PBS Nodefile: $PBS_NODEFILE" # print the names of the nodes allocated for parallel execution cat $PBS_NODEFILE echo "*************************************" HOST_NODEFILE=$PBS_NODEFILE else # print the names of the nodes allocated for parallel execution echo "LSF Hosts: $LSB_HOSTS" # loops over the nodes allocated for parallel execution HOST_NODEFILE=`pwd`/lsf_nodefile.$$ for host in ${LSB_HOSTS} do host=`host $host | awk '{ print $1 } '` echo $host >> ${HOST_NODEFILE} done fi A example of MPI job(cont)
cat ${HOST_NODEFILE} echo "*************************************" # prints the working directory on the master node echo "Current dir: $PWD" echo "*************************************" for i in `cat $HOST_NODEFILE` ; do echo "Mirroring via SSH to $i" # creates the working directories on all the nodes allocated for parallel execution ssh $i mkdir -p `pwd` # copies the needed files on all the nodes allocated for parallel execution /usr/bin/scp -rp ./* $i:`pwd` # checks that all files are present on all the nodes allocated for parallel execution echo `pwd` ssh $i ls `pwd` A example of MPI job(cont)
#setsthe permissions of the files ssh $i chmod 755 `pwd`/$EXE ssh $i ls -alR `pwd` echo "@@@@@@@@@@@@@@@" done echo "***********************************************************************" echo "Executing $EXE with mpirun" chmod 755 $EXE /opt/mpich/bin/mpirun -np $CPU_NEEDED -machinefile $HOST_NODEFILE `pwd`/$EXE A example of MPI job(cont)
Hello_mpi.jdl Type = "Job"; JobType = "MPICH"; NodeNumber = 2; Executable = "hello_mpi.sh"; Arguments = "hello_mpi.X 4 "; StdOutput = "hello.out"; StdError = "hello.err"; InputSandbox = {"hello_mpi.sh","hello_mpi.X"}; OutputSandbox = {"hello.err","hello.out"}; Requirements = (other.GlueCEInfoLRMSType == "PBS") || (other.GlueCEInfoLRMSType == "LSF") An example of MPI job(cont)
[gilda07] /home/liuag/EuChinaGrid/hello > edg-job-submit --vo gilda hello_mpi.jd Selected Virtual Organisation name (from --vo option): gilda Connecting to host gilda05.ihep.ac.cn, port 7772 Logging to host gilda05.ihep.ac.cn, port 9002 ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ ******************************************************************************** How to submit a MPI job
[gilda07] /home/liuag/EuChinaGrid/hello > edg-job-status https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: grid-ce.bio.dist.unige.it:2119/jobmanager-lcgpbs-long reached on: Tue Nov 21 07:27:03 2006 ************************************************************* Get the status of the submitted job
[gilda07] /home/liuag/EuChinaGrid/hello > edg-job-get-output https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ Retrieving files from host: gilda05.ihep.ac.cn ( for https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://gilda05.ihep.ac.cn:9000/xxu2ktFg3ChY2K-KTnxrIQ have been successfully retrieved and stored in the directory: /tmp/jobOutput/liuag_xxu2ktFg3ChY2K-KTnxrIQ ******************************************************************************** Get the output of the finished job
[gilda07] /home/liuag/EuChinaGrid/hello > more /tmp/jobOutput/liuag_xxu2ktFg3ChY 2K-KTnxrIQ/hello.out ......... *************************************************************** Executing hello_mpi.X with mpirun I am processor: 2 at grid-wn03.bio.dist.u I am processor: 3 at grid-wn03.bio.dist.u I am processor: 1 at grid-wn03.bio.dist.u I am processor: 0 at grid-wn03.bio.dist.u The output of the job
The limitation of the actual middleware, according to the RB schedule the MPI jobs only on a single CE reducing dramatically the number of WNs available for the parallel computation, can be overcome using the CrossGrid testbed based on LCG middleware. To get more information you can contact Jesus Marco (2014年7月13日arco@ifca.unican.es) or Harald Kornmayer (2014年7月13日arald.kornmayer@iwr.fzk.de). Conclusion
LCG-2 User Guide Manuals Series • 2014年7月13日https://edms.cern.ch/file/454439/LCG-2-UserGuide.pdf • 2014年7月13日http://oscinfo.osc.edu/training/ • 2014年7月13日http://www.netlib.org/mpi/index.html • 2014年7月13日http://www-unix.mcs.anl.gov/mpi/learning.html • 2014年7月13日http://www.ncsa.uiuc.edu/2014年7月13日serInfo2014年7月13日Training MPI on the web..
A job with data means that the inputdata of a job is not from the local disk of the UI ,but from the Storage Element . Job with data requirement
VirtualOrganisation = "gilda"; Executable = "/bin/echo"; Arguments = “Hello everyone"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out","std.err"}; DataCatalog = "http://lfc-gilda.ct.infn.it:8085"; InputData = {"lfn:/grid/gilda/scardaci/BoxData.txt"}; DataAccessProtocol = {"gridftp","rfio","gsiftp"}; An example of the Job with data
The InputData attribute is a string or a list of strings representing Logical File Names (LFN), Grid Unique Identifiers (GUID), Logical Dataset (LDS) and/or generic queries. The DataAccessProtocol attribute is a string or a list of strings representing the protocol or the list of protocols that the application is able to “speak” for accessing files listed in InputData on a given SE. User has to specify the inputdata in the JDL file with LFN All these data specified in inputdata are already stored in SEs and registered in data catalogs Important requirement
The output of the job: Output of the job more /tmp/liuag_6H6MxCv4gDHQh6To0y32fA/std.out hello everyone