140 likes | 313 Views
Schedd On The Side. What is it?. Specialized scheduler operating on schedd’s jobs. Job 1 Job 2 Job 3 Job 4 Job 5 …. Schedd On The Side. Job 4*. job queue. Schedd. Random Seed. Random Seed. Random Seed. Random Seed. Random Seed. Random Seed. Random Seed. Random Seed.
E N D
What is it? Specialized scheduler operating on schedd’s jobs. Job 1 Job 2 Job 3 Job 4 Job 5 … Schedd On The Side Job 4* job queue Schedd
Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Negotiator Schedd Startd Resources Condor Farm Story • Now that this is working, howcan I use my collaborator’sresources too? condor_submit job queue Application
Option #1: Merge Farms • Combine machines with collaborator into one Condor resource pool. • Everything works just like it did before. • Excellent option for small to medium clusters. • Requires bidirectional connectivity to all startds, or equivalent via GCB. • Requires some administrative coordination (e.g. upgrades, negotiator policy, security, etc.)
full featured(std universe etc) • automatic matchmaking • easy to configure • requires bidirectionalconnectivity • both sites must runcondor Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Negotiator Negotiator Schedd Remote Startds Random Seed Random Seed Random Seed Local Startds Option #2: Flocking Together
Gatekeeper Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Negotiator Schedd X Random Seed Random Seed Random Seed Startds Option #3: Grid Universe vanilla site X • easier to live with private networks • may use non-Condor resources • restricted Condor feature set(e.g. no std universe over grid) • must pre-allocating jobsbetween vanilla and grid universe
Random Seed Z Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Schedd On The Side Negotiator Gatekeeper Schedd X Random Seed Random Seed Random Seed Local Startds Y Option #4: Routing Jobs • dynamic allocation of jobsbetween vanilla and grid universes. • not every job is appropriate fortransformation into a grid job. vanilla site X site Y site Z
What About Flow Control? • May restrict routing to jobs which have been rejected by negotiator. • May limit maximum actively routed jobs on a per site basis. • May limit maximum idle routed jobs per site. • Periodic remove of idle routed jobs is possible, but no guarantee of optimal rescheduling. • Routing table may be reconfigured dynamically. • Multicast? Might be interesting to try.
What About I/O? • Jobs must be sandboxable (i.e. specifying input/output via transfer-files mechanism). • Routing of standard universe is not supported. • Additional restrictions may apply, depending on site network and disk.
Random Seed Random Seed Random Seed Random Seed Schedd On The Side Negotiator Schedd Schedd X Random Seed Random Seed Random Seed What Types of Grids? • Routing table may contain any combination of grid types supported by the grid universe. • Example: Condor-C site X • for two Condor sites, schedd-to-scheddsubmission requires no additional software • however, still not as trivial to use as flocking
Schedd On The Side Schedd X3 Schedd Routing Behind the Scenes • navigate internal firewalls • provide custom routesfor special users • improve scalability • However, keep in mindI/O requirements etc. Gatekeeper X2 X
Gatekeeper Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Random Seed Schedd On The Side Negotiator Schedd X Random Seed Random Seed Random Seed Startds Future Step: Glidein Factory glidein jobs site X home • true late binding of jobs to resources • may run on top of non-Condor sites • supports full feature set of Condor(e.g. standard universe) • requires GCB on network boundary(initiated by schedd-on-the-side?)
Random Seed Random Seed Random Seed Random Seed Random Seed Schedd On The Side Schedd glidein factory Glideing in the Works site X schedd-to-schedd • hierarchical strategy for scalabilityand reliability • better match for private networks schedd-to-gatekeeper • may require some additional horsepowerfrom gatekeeper machine, perhaps adedicated element for “edge services”.
Thanks Interested?Let us know. We are currently using job routing for specific users at UW. Future development will focus on more use-cases. Dan Bradley danb@cs.wisc.edu