120 likes | 299 Views
Faucets Queuing System. Presented by, Sameer Kumar. Basic Idea. Queuing System to manage Adaptive Jobs Adaptive jobs Jobs that can shrink and expand at runtime Simulations Provided encouraging results
E N D
Faucets Queuing System Presented by, Sameer Kumar
Basic Idea • Queuing System to manage Adaptive Jobs • Adaptive jobs • Jobs that can shrink and expand at runtime • Simulations Provided encouraging results • Also intended to be a general purpose queuing system that supports Generic, non-migratable Charm++ and MPI jobs
Adaptive Jobs • Jobs that can dynamically increase (expand) or decrease (shrink) the number of processors they are running on • Motivation • Improve system utilization • Decrease system response time • Properties • minpe, minimum number of processors required for the job, related to the memory requirements of the job • maxpe, maximum number of processors, related to speedup • profit, profit from running the job • deadline, deadline before which the job should be finished
Adaptive Job Example • Consider a 128 processor system • Job A arrives and requests 80 Processors and is started on 80 processors • Job B then arrives and requests 64 processors • In traditional systems Job B will be queued and allocated 64 processors only after Job A finishes, while part of the system remains idle • With adaptive jobs Job A can be shrunk to 64 processors and Job B can be started and after job A finishes Job B can expand and use all the processors
Adaptive Job Scheduler • Adaptive Job Scheduler manages adaptive jobs • Three major components • Job Manager • Accepts jobs, schedules them on the parallel system, and frees resources when the job is done • Scheduling Strategy • A plug-able component that makes decisions on which jobs to schedule • Database • Logs all events that occur in the scheduler and can be used in case of a crash
Performance Simulation Results on 64 processors with mean job execution time of 64.5 sec for utilization maximizing strategy Experiments on Linux Cluster on 64 processors and mean job execution time of 60 sec λ=Arrival Rate MRT = Mean Response Time Utilization = Processor utilization Load Factor (lf) = Execution Time*λ
Features • Multithreaded for fast response • Logs all job related information to a database • This helps in crash recovery and, • Improves security of the system • Uses Unix sockets for communication • Unix Sockets improve the efficiency of the system • The also restrict access to the scheduler • Provides timed termination of the jobs
Features (continued) • Accepts both batch and interactive jobs • MaxPE and MaxTime are parameters to the system and can be used to restrict unlimited access to the parallel machine • Tested on the cool Linux cluster at PPL • Adaptive jobs currently implemented in Charm++ and MPI • For more details check out http://charm.cs.uiuc.edu/research/faucets/faucets.html
Super-user Access • The Queuing system scheduler runs with super-user privileges • When a new job arrives it is executed with the permissions of the user • The code has been checked for stack overflows • Direct Access to the parallel machine is blocked by removing the permissions for rsh, ssh etc • To start a job the scheduler changes its group id to a Queuing System group which can access the cluster
Queuing System Commands • Similar to current queuing systems • fsub is the command to submit batch jobs to the queuing system • frun runs jobs interactively • fjobs lists the jobs • fkill can be used to kill jobs
Conclusions and Future Work • Queuing system has been tested and is ready to be installed on the Turing cluster • Make the scheduler manage multiple heterogeneous clusters by supporting the concept of queues • Some of the queues could be batch and others interactive • Interactive queues would allocate multiple jobs to the same node depending on the utilization of the nodes • Running the scheduler on SP2 and other multiprocessing architectures • One of the solutions would be to run the faucets scheduler on top of a commercial queuing system