250 likes | 420 Views
MPI Scheduling in Condor: An Update Paradyn/Condor Week Madison, WI 2002. Outline. Review of Dedicated/MPI Scheduling in Condor Dedicated vs. Opportunistic Backfill Supported MPI Implementations Supported Platforms Future Work. What is MPI?. MPI is the “Message Passing Interface”
E N D
MPI Scheduling in Condor: An Update Paradyn/Condor WeekMadison, WI 2002
Outline • Review of Dedicated/MPI Scheduling in Condor • Dedicated vs. Opportunistic • Backfill • Supported MPI Implementations • Supported Platforms • Future Work
What is MPI? • MPI is the “Message Passing Interface” • A library for writing parallel applications • Fixed number of nodes • Cannot be preempted • Lots of scientists use it for large problems • MPI is a standard with many different implementations
Dedicated Scheduling in Condor • To schedule MPI jobs, Condor must have access to dedicated resources • More and more Condor pools are being formed from dedicated resources • Few schedulers handle both dedicated and non-dedicated resources at the same time
Problems with Dedicated Compute Clusters • Dedicated resources are not really dedicated • Most software for controlling clusters relies on dedicated scheduling algorithms • Assume constant availability of resources to compute fixed schedules • Due to hardware and software failure, dedicated resources are not always available over the long-term
The Condor Solution • Condor overcomes these difficulties by combining aspects of dedicated and opportunistic scheduling into a single system • Opportunistic scheduling involves placing jobs on non-dedicated resources under the assumption that the resources might not be available for the entire duration of the jobs • This is what Condor has been doing for years
The Condor Solution (cont’d) • Condor manages all resources and jobs within a single system • Administrators only have to maintain one system, saving time and money • Users can submit a wide variety of jobs: • Serial or parallel (including PVM + MPI) • Spend less time learning different scheduling tools, more time doing science
Claiming Resources for Dedicated Jobs • When the dedicated scheduler (DS) has idle jobs, it queries the collector to find all dedicated resources • DS does match-making to decide which resources it wants • DS sends requests to the opportunistic scheduler to claim those resources • DS claims resources and has exclusive control (until it releases them)
Backfilling: The Problem • All dedicated schedulers leave “holes” • Traditional solution is to use backfilling • Use lower priority parallel jobs • Use serial jobs • However, if you can’t checkpoint the serial jobs, and/or you don’t have any parallel jobs of the right size and duration, you’ve still got holes
Backfilling: The Condor Solution • In Condor, we already have an infrastructure for managing non-dedicated nodes with opportunistic scheduling, so we use that to fill the holes in the dedicated schedule • Our opportunistic jobs can be checkpointed and migrated when the dedicated scheduler needs the resources again • Allows dedicated resources to be used for opportunistic jobs as needed
Specific MPI Implementations • Supported: • MPICH • Planned: • MPIPro • LAM • Others?
Condor’s MPICH Support • MPICH uses rsh to spawn jobs • Condor provides our own rsh tool • Older versions of MPICH need to be built without a hard-coded path to rsh • Newer versions of MPICH (1.2.2.3 and later) support an environment variable, P4_RSHCOMMAND, which specifies what program should be used
Condor and MPIPro • We’ve investigated supporting MPIPro jobs with Condor • MPIPro has some issues with selecting a port for the head node in your computation, and we’re looking for a good solution
Condor + LAM = "LAMdor” • LAM's API is better suited for a dynamic environment, where hosts can come and go from your MPI universe • Has a different mechanism for spawning jobs than MPICH • Condor working to support their methods for spawning
LAMdor (Cont’d) • LAM working to understand, expand, and fully implement the dynamic scheduling calls in their API • LAM also considering using Condor’s libraries to support checkpointing of MPI computations
Other MPI implementations • What are people using? • Do you want to see Condor support any other MPI implementations? • If so, let us know by sending email to: condor-admin@cs.wisc.edu
Supported Platforms • Condor’s MPI support is now available on all Condor platforms: • Unix • Linux, Solaris, Digital Unix, IRIX, HPUX • Windows (new since last year) • NT, 2000
Future work (short-term) • Implementing more advanced dedicated scheduling algorithms • Integrating Condor’s user priority system with its dedicated scheduling • Adding support for user-specified job priorities (among their own jobs) • Condor-MPI support for the Tool Daemon Protocol
Future work (longer term) • Solving problems w/ MPI on the Grid • "Flocking" MPI jobs to remote pools, or even spanning pools with a single computation • Solving issues of resource ownership on the Grid (i.e. how do you handle multiple dedicated schedulers on the grid wanting to control a given resource?)
More Future work • Support for other kinds of dedicated jobs: • Generic dedicated jobs • We gather and schedule the resources, then call your program, give it the list of machines, and let the program spawn itself • Linda (parallel programming interface) • Gaussian (computational chemistry)
More Future work • Better support for preempting opportunistic jobs to facilitate running high-priority dedicated ones • “Checkpointing” vanilla jobs to swap space • Checkpointing entire MPI computations • MW using Condor-MPI
How do I start using MPI with Condor? • MPI support added and tested in the current development series (6.3.X) • MPI support is a built-in feature of the next stable series of Condor (6.4.X) • 6.4.0 will be released Any Day Now™
Thanks for Listening! • Questions? • Come to the MPI “BoF”, Wednesday, 3/6/02, 11am-noon, 3385 CS • For more information: • www.cs.wisc.edu/condor • condor-admin@cs.wisc.edu