1 / 24

Dynamic DAGMan with ClassAds

Himani Apte. Dynamic DAGMan with ClassAds. Outline. DAGMan workflow management Motivation for dynamic DAGMan ClassAds Putting together: DAGMan + ClassAds Looking ahead. DAGMan. Directed Acyclic Graph Manager Meta-scheduler for Condor DAG: set of jobs with dependencies

smithcory
Download Presentation

Dynamic DAGMan with ClassAds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Himani Apte Dynamic DAGMan with ClassAds

  2. Outline • DAGMan workflow management • Motivation for dynamic DAGMan • ClassAds • Putting together: DAGMan + ClassAds • Looking ahead

  3. DAGMan • Directed Acyclic Graph Manager • Meta-scheduler for Condor • DAG: set of jobs with dependencies • Manages submission of DAG jobs • Enforces execution order • DAGMan itself is a Condor job!

  4. Example DAG Job A A.condor Job B B.condor Job C C.condor Job D D.condor Parent A Child B C Parent B C Child D Script PRE A input.sh Script POST D output.sh A B C D

  5. Simplified state diagram of a DAG node Pre-running Post-running Waiting Submitted Done Failed

  6. DAGMan: important properties • Monitors job state using Condor logs • Simple and clean recovery model • Rescue DAG: saves state at failure • Restart: reconstruct internal state • Scripts allow “lazy” planning • Throttling parameters

  7. Outline • DAGMan workflow management • Motivation for dynamic DAGMan • ClassAds • Putting together: DAGMan + ClassAds • Looking ahead

  8. Motivation for dynamic DAGMan • DAG: complete execution order • Flexibility to make run-time decisions • Which subset of DAG nodes should execute? • When should node X execute? • Conditional DAGs • Associate a condition with DAG edges • Simplest condition: successful completion of parent nodes

  9. Conditional DAG: examples Example 1 Example 2 A P1 P2 Condition: P1.x OR P2.x Condition: A.x = = true Yes No B C C

  10. Motivation for dynamic DAGMan • Scripts can be leveraged for lazy planning • For simple conditions • E.g. exit value of job • Modify DAG structure • E.g. convert branch-not-taken to no-op/empty • We want a generic solution • Supported by “Dynamic DAGMan”

  11. Outline • DAGMan workflow management • Motivation for dynamic DAGMan • ClassAds • Putting together: DAGMan + ClassAds • Looking ahead

  12. ClassAds • Classified advertisements • Used extensively in Condor • Define jobs, machines, resources • Define conditions, triggers, requirements • Maintain internal state

  13. ClassAds • List of attribute-value pairs • Simple value types: integer, strings • Complex types: list, expressions, ClassAds • Matchmaking framework • Tests match between two classAds • Using “Requirements” expression • Great fit for Dynamic DAGMan

  14. Outline • DAGMan workflow management • Motivation for dynamic DAGMan • ClassAds • Putting together: DAGMan + ClassAds • Looking ahead

  15. Putting together: DAGMan + ClassAds • Dynamic DAGMan research project • Work-in-progress • Not yet available in Condor • DAG nodes have associated classAds • Basic node attributes • Job identifier, name, type • Status (Waiting, Submitted, Done, etc.)

  16. Dynamic DAGMan: attributes • Execution characteristics of job • Exit value • Wall-clock time • CPU utilization (local and remote) • Network statistics (bytes sent / received) • Information about files transferred (for vanilla universe) • Attributes maintained by Condor for a job

  17. Dynamic DAGMan: conditions • Requirements expression • Defines trigger condition for the node • Arbitrarily complex expression • Defined on the attributes of parent nodes • Use matchmaking to determine if a node can be submitted

  18. Dynamic DAG: example Job A A.condor Job B B.condor Job C C.condor Parent A Child B \ COND [ ( other.job == A && other.x == true ) ] Parent A Child C \ COND [ ( other.job == A && other.x == false ) ] A Yes No condition x = = true B C

  19. Dynamic DAGMan: example Job P1 P1.condor Job P2 P2.condor Job C C.condor Parent P1 P2 Child C \ COND [ (other.job == P1 && other.x == true) || (other.job == P2 && other.x == true) ] P1 P2 Condition: P1.x OR P2.x C

  20. Dynamic DAGMan • Recovery model is still the same • Rescue DAG: saves node state at failure • ClassAd attribute-values can be re-generated from Condor logs • Flexibility to make run-time decisions • Which subset of nodes in the DAG should be executed? • When should node X be executed?

  21. Outline • DAGMan workflow management • Motivation for dynamic DAGMan • ClassAds • Putting together: DAGMan + ClassAds • Looking ahead

  22. Looking ahead • DAG with only implicit edges • Parent-child relations embedded in classAds • Nodes specify • Trigger condition • Preference for child nodes to run • On-the-fly dependency formation based on previous node execution • DAGMan collaborates with Quill • Getting attributes from persistent storage

  23. Looking ahead • Allow job to modify/add its attributes • Determine what happens after job exits • Global state control • Throttling expression/parameters • Global DAG-classAd • Statistics on running, successful and failed jobs • E.g. if (#failed jobs > N ) run cleanup node

  24. Thank-you We are interested in knowing your suggestions!

More Related