300 likes | 471 Views
Condor and MPI Paradyn/Condor Week Madison, WI 2001. Overview. MPI and Condor: Why Now? Dedicated and Opportunistic Scheduling How Does it All Work? Specific MPI Implementations Future Work. What is MPI?. MPI is the “Message Passing Interface”
E N D
Overview • MPI and Condor: Why Now? • Dedicated and Opportunistic Scheduling • How Does it All Work? • Specific MPI Implementations • Future Work
What is MPI? • MPI is the “Message Passing Interface” • Basically, a library for writing parallel applications that use message passing for inter-process communication • MPI is a standard with many different implementations
MPI and Condor: Why Haven’t We Supported it Until Now? • MPI's model is a static world • We always saw the world as dynamic, opportunistic, ever-changing • We focused our parallel support on PVM which supported a dynamic environment
MPI With Condor:Why Now? • More and more Condor pools are being formed from dedicated resources • MPI's API is also starting to move towards supporting a dynamic world (e.g. LAM, MPI2, etc) • Few schedulers (if any) handle both opportunistic and dedicated resources at the same time
Dedicated and Opportunistic Scheduling • Resources can move between 'dedicated' and 'opportunistic' status • Users submit jobs that are either dedicated (e.g. Universe = MPI) or opportunistic (e.g. Universe = standard)
Dedicated and Opportunistic (Cont'd) • Condor leaves all resources as opportunistic unless it sees dedicated jobs to service • The Dedicated Scheduler ('DS') claims opportunistic resources and turns them into dedicated ones to schedule into the future
Dedicated and Opportunistic (Cont'd) • When the DS has no more jobs, it releases the resources which go back to serving opportunistic jobs
Dedicated Scheduling, and "Back-Filling” • There will always be "holes" in the dedicated schedule, sets of resources that can't be filled with dedicated jobs for certain periods of time • Traditional solution is “back-filling” the holes with smaller dedicated jobs • However, these might not be preemptable
Back-Filling (Cont’d) • Instead of back-filling with dedicated jobs, we give the resources to Condor’s opportunistic scheduler • Condor runs preemptable opportunistic jobs until the DS decides it needs the resources again and reclaims them
Dedicated Resources are Opportunistic Resources • Even “dedicated” resources are really opportunistic • Hardware failure, software failure, etc • Condor handles these failures better than traditional dedicated schedulers, since our system already deals with them after years of opportunistic scheduling experience
How Does MPI Support in Condor Really Work? • Changes to the resource agent (condor_startd) • Changes to the job scheduling agent (condor_schedd) • Changes to the rest of the Condor system
How Do You Make a Resource Dedicated in Condor? • Just have to change a few config file settings.... no new startd binary is required • Add an attribute to the classad saying which scheduler, if any, this resource is willing to become dedicated to
Other Configuration Changes for the startd • In addition, you must change the policy expressions: • Must always be willing to run jobs from the DS • While the resource is claimed by the DS, the startd should never suspend or preempt jobs.
Submitting Dedicated Jobs • Requires a new "contrib" version of the condor_schedd • Condor "wakes up" the dedicated scheduler logic inside the condor_schedd when MPI jobs are submitted
How Does Your Job Get Resources? • The DS does a query to find all resources that are willing to become dedicated to it • DS sends out "resource request" classads and negotiates for resources with the negotiator (the opportunistic scheduler)
How Does Your Job Get Resources? (Cont’d) • DS then claims resources directly • Once resources are available, the DS schedules and spawns jobs • When jobs complete, if more MPI jobs can be serviced with the same resources, the DS holds onto them and uses them immediately
Changes to the rest of Condor? • Very few other changes required • Users can use all the same tools, interfaces, etc. • Just need a new condor_starter to actually spawn MPI jobs (will also be offered as a contrib module)
Specific MPI Implementations • MPICH • LAM • Others?
Condor and MPICH • Currently we support MPICH on Unix • Working on adding MPICH-NT support • NT’s MPICH has a different mechanism to spawn jobs than the Unix MPICH...
Condor + LAM = "LAMdor” • LAM's API is better suited for a dynamic environment, where hosts can come and go from your MPI universe • Has a different mechanism for spawning jobs than MPICH • Condor working to support their methods for spawning
LAMdor (Cont’d) • LAM working to understand, expand, and fully implement the dynamic scheduling calls in their API • LAM also considering using Condor’s libraries to support checkpointing of MPI computations
MPI-2 Standard • The MPI-2 standard contains calls to handle dynamic resources • Not yet fully implemented by anyone • When it is, we'll support it
Other MPI implementations • What are people using? • Do you want to see Condor support any other MPI implementations? • If so, send email to condor@cs.wisc.edu and let us know
Future work • Implementing more advanced dedicated scheduling algorithms • Support for all sorts of MPI implementations (LAM, MPICH-NT, MPI-2, others)
More Future work • Solving problems w/ MPI on the Grid • "Flocking" MPI jobs to remote pools, or even spanning pools with a single computation • Solving issues of resource ownership on the Grid (i.e. how do you handle multiple dedicated schedulers on the grid wanting to control a given resource?)
More Future work • Checkpointing entire MPI computations • "MW" implmentation on top of Condor-MPI
More Future work • Support for other kinds of dedicated jobs • Generic dedicated jobs (we just gather and schedule the resources, then call your program, give it the list of machines, and let the program spawn itself) • LINDA
How do I start using MPI with Condor? • MPI support is still alpha, not quite ready for production use • A beta release should be out soon as a contrib module • Check the web site www.cs.wisc.edu/condor
Thanks for Listening! • Questions? • For more information: • http://www.cs.wisc.edu/condor • mailto:condor@cs.wisc.edu