280 likes | 388 Views
Advanced services in gLite. Gergely Sipos and Peter Kacsuk MTA SZTAKI. Outline. Advanced job types Interactive jobs Checkpointing jobs MPI jobs Workflows Condor DAGMan gLite workflow. Normal job. We have talked about Normal jobs sequential program takes input performs computation
E N D
Advanced services in gLite Gergely Sipos and Peter Kacsuk MTA SZTAKI
Outline • Advanced job types • Interactive jobs • Checkpointing jobs • MPI jobs • Workflows • Condor DAGMan • gLite workflow Grid Computing School, 10-12 July 2006, Rio de Janeiro
Normal job • We have talked about Normal jobs • sequential program • takes input • performs computation • writes output • The user gets the output after the execution • Other options: • Interactive jobs • Logical checkpointing jobs • MPI jobs • Workflows Grid Computing School, 10-12 July 2006, Rio de Janeiro
Interactive Job (I) • The Interactive job is a job whose standard streams are forwarded to the submitting client • The user has to set the JDL JobType attribute to interactive • When an interactive job is submitted, the edg-job-submit command • starts a Grid console shadow process in the background that listens on a port assigned by the Operating System • The port can be forced through the ListenerPort attribute in the JDL • opens a new window where the incoming job streams are forwarded • The DISPLAY environment variable has to be set correctly, because an X window is open • The user can specify --nogui option, which makes the command provide a simple standard non-graphical interaction with the running job • It is not necessary to specify the OutputSandbox attribute in the JDL because the output will be sent to the interactive window Grid Computing School, 10-12 July 2006, Rio de Janeiro
Interactive jobs (II) • Specified setting JobType = “Interactive” in JDL • When an interactive job is executed, a window for the stdin, stdout, stderr streams is opened • Possibility to send the stdin to the job • Possibility the have the stderr and stdout of the job when it is running • Possibility to start a window for the standard streams for a previously submitted interactive job with command edg-job-attach Grid Computing School, 10-12 July 2006, Rio de Janeiro
Logical Checkpointing Job • The Checkpointing job is a job that can be decomposed in several steps • In every step the job state can be saved in the LB and retrieved later in case of failures • The job state is a set of pairs <key, value> defined by the user • The job can start running from a previously saved state and not from the beginning again • The user has to set the JDL JobType attribute to checkpointable Grid Computing School, 10-12 July 2006, Rio de Janeiro
Logical Checkpointing Job • When a checkpointable job is submitted and starts from the beginning, the user run simply the edg-job-submit command • the number of steps, that represents the job phases, can be specified by the JobSteps attribute • e.g. JobSteps = 2; • the list of labels, that represents the job phases, can be specified by the JobSteps attribute • e.g. JobSteps = {“january”, “february”}; • The latest job state can be obtained by using the edg-job-get-chkpt <jobid> command • A specific job state can be obtained by using the edg-job-get-chkpt –cs <state_num> <jobid> command • When a checkpointable job has to start from an intermediate job state, the user run the edg-job-submit command using the –chkpt <state_jdl> option where <state_jdl> is a valid job state file, where the state of a previously submitted job was saved Grid Computing School, 10-12 July 2006, Rio de Janeiro
Job checkpointing example Example of Application (e.g. HEP MonteCarlo simulation) int main () { … for (int i=event; i < EVMAX; i++) { < process event i>;} ... exit(0); } Grid Computing School, 10-12 July 2006, Rio de Janeiro
Job checkpointing example User code must be easily instrumented in order to exploit the checkpointing framework … #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); < copy file_on_SE locally>; … for (int i=event; i < EVMAX; i++) { < process event i>; ... state.saveValue("first_event", i+1); < save intermediate file on a SE>; state.saveValue("filename", PFN of file_on_SE); ... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } Grid Computing School, 10-12 July 2006, Rio de Janeiro
Job checkpointing example #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); < copy file_on_SE locally>; … for (int i=event; i < EVMAX; i++) { < process event i>; ... state.saveValue("first_event", i+1); < save intermediate file on a SE>; state.saveValue("filename", PFN of file_on_SE); ... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } • User defines what is a state • Defined as <var, value> pairs • Must be “enough” to restart a • computation from a • previously saved state Grid Computing School, 10-12 July 2006, Rio de Janeiro
Job checkpointing example #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); < copy file_on_SE locally>; … for (int i=event; i < EVMAX; i++) { < process event i>; ... state.saveValue("first_event", i+1); < save intermediate file on a SE>; state.saveValue("filename", PFN of file_on_SE); ... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } User can save from time to time the state of the job Grid Computing School, 10-12 July 2006, Rio de Janeiro
Job checkpointing example #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); < copy file_on_SE locally>; … for (int i=event; i < EVMAX; i++) { < process event i>; ... state.saveValue("first_event", i+1); < save intermediate file on a SE>; state.saveValue("filename", PFN of file_on_SE); ... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } Retrieval of the last saved state The job can restart from that point Grid Computing School, 10-12 July 2006, Rio de Janeiro
MPI Job • There are a lot of libraries supporting parallel jobs, but we decided to support MPICH. • The MPI job is run in parallel on several processors • The user has to set the JDL JobType attribute to MPICH and specify the NodeNumber attribute that’s the required number of CPUs • When a MPI job is submitted, the UI adds • in the Requirements attribute Member(“MpiCH”, other.GlueHostApplicationSoftwareRunTimeEnvironment)(the MPICH runtime environment must be installed on the CE) other.GlueCEInfoTotalCPUs >= NodeNumber(a number of CPUs must be at least be equal to the required number of nodes) • In the Rank attribute other.GlueCEStateFreeCPUs (it is chosen the CE with the largest number of free CPUs) Grid Computing School, 10-12 July 2006, Rio de Janeiro
MPI Job [ JobType = "MPICH"; NodeNumber = 2; Executable = "MPItest.sh"; Argument = "cpi 2"; InputSandbox = {"MPItest.sh", "cpi"}; OutputSandbox = "executable.out"; Requirements = other.GlueCEInfoLRMSType == “PBS” || other.GlueCEInfoLRMSType == “LSF”; ] • The NodeNumber entry is the number of threads of MPI job • The MPItest.sh script only works if PBS or LSF is the local job manager Grid Computing School, 10-12 July 2006, Rio de Janeiro
MPI Job • Snapshot of MPItest.sh: # $HOST_NODEFILE contains names of hosts allocated for MPI job for i in `cat $HOST_NODEFILE` ; do echo "Mirroring via SSH to $i" #creates the working directories on all the nodes allocated for parallel execution ssh $i mkdir -p `pwd` #copies the needed files on all the nodes allocated for parallel execution /usr/bin/scp -rp ./* $i:`pwd` # sets the permissions of the files ssh $i chmod 755 `pwd`/$EXE ssh $i ls -alR `pwd` done #execute the parallel job with mpirun mpirun -np $CPU_NEEDED -machinefile $HOST_NODEFILE `pwd`/$EXE > executable.out • Important: you need shared keys between worker nodes • Avoids sharing of home directories • Enforced in GILDA • NOT enforced in LCG2 … The VO needs to negotiate on a site by site basis Grid Computing School, 10-12 July 2006, Rio de Janeiro
Condor DAGMan • Directed Acyclic Graph Manager • DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. • (e.g., “Don’t run job “B” until job “A” has completed successfully.”) Grid Computing School, 10-12 July 2006, Rio de Janeiro
Job A Job B Job C Job D What is a DAG? • A DAG is the datastructure used by DAGMan to represent these dependencies. • Each job is a “node” in the DAG. • Each node can have any number of “parent” or “children” nodes – as long as there are no loops! Grid Computing School, 10-12 July 2006, Rio de Janeiro
Job A Job B Job C Job D Defining a Condor DAG • A DAG is defined by a .dagfile, listing each of its nodes and their dependencies: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D • each node will run the Condor job specified by its accompanying Condor submit file Grid Computing School, 10-12 July 2006, Rio de Janeiro
Submitting a Condor DAG • To start your DAG, just run condor_submit_dag with your .dag file, and Condor will start a personal DAGMan daemon which to begin running your jobs: % condor_submit_dag diamond.dag • condor_submit_dag submits a Scheduler Universe Job with DAGMan as the executable. • Thus the DAGMan daemon itself runs as a Condor job, so you don’t have to baby-sit it. Grid Computing School, 10-12 July 2006, Rio de Janeiro
Running a Condor DAG • DAGMan acts as a “meta-scheduler”, managing the submission of your jobs to Condor based on the DAG dependencies. DAGMan A Condor Job Queue .dag File A B C D Grid Computing School, 10-12 July 2006, Rio de Janeiro
Running a Condor DAG (cont’d) • DAGMan holds & submits jobs to the Condor queue at the appropriate times. DAGMan A Condor Job Queue B B C C D Grid Computing School, 10-12 July 2006, Rio de Janeiro
Running a Condor DAG (cont’d) • In case of a job failure, DAGMan continues until it can no longer make progress, and then creates a “rescue” file with the current state of the DAG. DAGMan A Condor Job Queue Rescue File B X D Grid Computing School, 10-12 July 2006, Rio de Janeiro
Recovering a Condor DAG • Once the failed job is ready to be re-run, the rescue file can be used to restore the prior state of the DAG. DAGMan A Condor Job Queue Rescue File B C C D Grid Computing School, 10-12 July 2006, Rio de Janeiro
Recovering a Condor DAG (cont’d) • Once that job completes, DAGMan will continue the DAG as if the failure never happened. DAGMan A Condor Job Queue B C D D Grid Computing School, 10-12 July 2006, Rio de Janeiro
Finishing a Condor DAG • Once the DAG is complete, the DAGMan job itself is finished, and exits. DAGMan A Condor Job Queue B C D Grid Computing School, 10-12 July 2006, Rio de Janeiro
Additional DAGMan Features • Provides other handy features for job management… • nodes can have PRE & POST scripts • failed nodes can be automatically re-tried a configurable number of times Grid Computing School, 10-12 July 2006, Rio de Janeiro
DAG Job in EGEE • The DAG job is a Directed Acyclic Graph Job • The user has to set in the JDL • JobType=„dag”, • nodes ( containing the description of the nodes), and • dependencies attributes NOTE: • A plug-in has been implemented to map an EGEE DAG submission to a Condor DAG submission • Some improvements have been applied to the ClassAd API to better address WMS need Grid Computing School, 10-12 July 2006, Rio de Janeiro
cmkin1 cmkin4 cmkin2 cmkin5 cmkinN cmkin3 DAG Job in EGEE nodes = { cmkin1 = [ file = “bckg_01.jdl" ;], cmkin2 = [ file = “bckg_02.jdl" ;], …… cmkinN = [ file = “bckg_0N.jdl" ;] }; dependencies = { {cmkin1, cmkin2}, {cmkin2, cmkin3}, {cmkin2, cmkin5}, {{cmkin4, cmkin5}, cmkinN} } Grid Computing School, 10-12 July 2006, Rio de Janeiro