100 likes | 255 Views
Workflow Management in Condor. Gökay Gökçay. DAGMan Meta-Scheduler . The Directed Acyclic Graph Manager ( DAGMan ) is a meta-scheduler for Condor jobs. DAGMan is responsible for submitting batch jobs in a predefined order and processing the results
E N D
Workflow Management in Condor Gökay Gökçay
DAGMan Meta-Scheduler • The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan is responsible for submitting batch jobs in a predefined order and processing the results • DAGManreads the Condor log file generated by each Condor job to find out which jobs are unsubmitted, submitted, or complete. • DAGManalso makes a guarantee that a DAG is recoverable, even if the machine running DAGMan goes down during execution.
Dag File Example # Filename: diamond.dag Job A A.condor Job B B.condor Job C C.condor Job D D.condor PARENT A CHILD B C PARENT B C CHILD D
Submitting the DAG to Condor • In order to guarantee recoverability, the DAGMan program itself is run as a Condor job. • “condor_submit_dagdiamond.dag” • This script will generate the diamond.dag.condor.subCondorCommandFile for the DAG, and submit it to Condor
Essentials • Prepare Jobs Each CondorCommandFile can only submit one job. Multi-job clusters (multiple queue lines) are not supported. The log= for all CondorCommandFiles must point to the same Condor log file, otherwise, DAGMan will not see all the Condor log entries for every job in the DAG. • Write DAG File Write the DAG file, so that JOB entries refer to the CondorCommandFiles you wrote in the previous step. • Submit the DAG Finally, you submit the DAG written in the previous step using the condor_submit_dag script.
Complications • Setup, Cleanup, or Interpretation of a Node (Scripts) (Ex: Decompression, Compression, Serialization etc.) • Throttling (Too many scripts) • Unreliable applications or subsystems
Stork • Stork is an emerging Condor technology for managing data placement. • Stork provides a fault tolerant framework for scheduling data allocation and data transfer jobs. The architecture is modular and extensible, with support for many popular storage systems and data transfer protocols. • Modules: ftp , gsiftp (Grid FTP), http, nest (Condor Nest Network Storage), srb (SDSC storage resource broker), csrm (Castor Srm), srm(dCacheSRM), unitree(NCSA UniTree), diskrouter
Condor submit file $ cat process.condor universe = vanilla executable = /bin/sort arguments = /tmp/stork/index.html /tmp/stork/classad-talk.ps output = /tmp/stork/process.results.out error = process.results.err log = process.results.log should_transfer_files= YES when_to_transfer_output= ON_EXIT notification = never queue
Using Stork with Condor DAGMan $ cat transfer.stork [ dap_type= transfer; src_url= "file:/tmp/stork/process.results.out"; dest_url= "nest://turkey.cs.wisc.edu/1.dat"; alt_protocols= "gsiftp-nest" log = "transfer.log"; ] $ cat stork-condor.dag DATA INPUT1 alt_protocol.stork DATA INPUT2 transfer_ftp-file.stork JOB PROCESS process.condor DATA OUTPUT transfer.stork PARENT INPUT1 INPUT2 CHILD PROCESS PARENT PROCESS CHILD OUTPUT
ThanksForListening Questions?