60 likes | 219 Views
Juggling Jobs. Jug is a python-based job management system, borrowing ideas from DAGMan, MOP, BOSS, Hawk, and probably others. Filling the Jug Database. MCRunjob “configurator” Inserts a batch of job entries into Jug from a general workflow description.
E N D
Juggling Jobs Jug is a python-based job management system, borrowing ideas from DAGMan, MOP, BOSS, Hawk, and probably others.
Filling the Jug Database • MCRunjob “configurator” Inserts a batch of job entries into Jug from a general workflow description. May be driven by RefDB, the CERN assignment database. • Or native Jug syntax for stand-alone use Batch #child batch name = “edde.cmkin” seed_low = 120000 seed_high = seed_low + 400 software = “/cms/sw/cmkin_edde” environment = EVENTS_PER_JOB = 250 Batch #parent batch name = “edde.oscar” parent name = “edde.cmkin” input_files = “*.ntpl” software = “/cms/sw/oscar_3_3_2” “/cms/pool” environment = DATASET = “edde” OWNER = “edde_oscar332”
Batch Management The “DAG in a database” may be monitored and extended at any time. User may drill into aggregate view to inspect details.
Lazy Scheduling • Schedule by “competing pull” • Load balancing without prediction. • Nodes may race on same job. • Storage pulls output and provides two-phase commit of job. • Submission of workers to batch queue or grid may be balanced across multiple machines, including remote submission points.
Autonomous Execution • Work-loop is a pipelined queue • Output may be queued • Pre-staged input may be queued • So jobs can keep running in the face of network and service outages. • Handling loss of contact with workers • Write them off but welcome them back. • Two-phase commit of output prevents race conditions. • Optimistic approach maximizes throughput.