110 likes | 121 Views
The Condor JobRouter. aka “schedd on the side”. Status. It’s in the current development series: Condor 7.1.0, unix (windows soonish) Used heavily by CMS physics experiment for simulation on Open Science Grid (millions of jobs routed). What is “job routing”?. original (vanilla) job.
E N D
aka “schedd on the side” Dan, Condor Week 2008
Status It’s in the current development series: Condor 7.1.0, unix (windows soonish) Used heavily by CMS physics experiment for simulation on Open Science Grid (millions of jobs routed) Dan, Condor Week 2008
What is “job routing”? original (vanilla) job routed (grid) job Universe = “vanilla” Executable = “sim” Arguments = “seed=345” Output = “stdout.345” Error = “stderr.345” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” Universe = “grid” GridType = “gt2” GridResource = \“cmsgrid01.hep.wisc.edu/jobmanager-condor” Executable = “sim” Arguments = “seed=345” Output = “stdout” Error = “stderr” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” JobRouter Routing Table: Site 1 … Site 2 … final status Dan, Condor Week 2008
Routing is just site-level matchmaking • With feedback from job queue • number of jobs currently routed to site X • number of idle jobs routed to site X • rate of recent success/failure at site X • And with power to modify job ad • change attribute values (e.g. Universe) • insert new attributes (e.g. GridResource) • add a “portal” grid proxy if desired Dan, Condor Week 2008
Configuring the Routing Table • JOB_ROUTER_ENTRIES • list site ClassAds in configuration file • JOB_ROUTER_ENTRIES_FILE • read site ClassAds periodically from a file • JOB_ROUTER_ENTRIES_CMD • read periodically from a script • example: query a collector such as Open Science Grid Resource Selection Service Dan, Condor Week 2008
Syntax • Read the 7.1 manual. • It’s in the chapter on Grid Computing [ Name = “Grid Site 1”;GridResource = “gt2 gatekeeper…”;MaxIdleJobs = 10;FailureRateThreshold = 0.01; ] Dan, Condor Week 2008
What Types of Input Jobs? • Vanilla Universe • Self Contained(everything needed is in file transfer list) • High Throughput(many more jobs than cpus) Dan, Condor Week 2008
What Target Grid Types? • Globus, Condor-C work well • others untested, but should be fine • Why only target the grid universe? • no reason at all • 7.1.1 now allows any destination universe Dan, Condor Week 2008
Grid Gotchas • Globus gt2 • no exit status from job (reported as 0) • must explicitly list desired output files Dan, Condor Week 2008
JobRouter vs. Glidein • Glidein - Condor overlays the grid • job never waits in remote queue • job runs in its normal universe • private networks doable, but add to complexity • need something to submit glideins on demand • JobRouter • some jobs wait in remote queue (MaxIdleJobs) • job must be compatible with target grid semantics • simple to set up, fully automatic to run Dan, Condor Week 2008