1 / 12

Faucets Queuing System

Faucets Queuing System. Presented by, Sameer Kumar. Basic Idea. Queuing System to manage Adaptive Jobs Adaptive jobs Jobs that can shrink and expand at runtime Simulations Provided encouraging results

valiant
Download Presentation

Faucets Queuing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Faucets Queuing System Presented by, Sameer Kumar

  2. Basic Idea • Queuing System to manage Adaptive Jobs • Adaptive jobs • Jobs that can shrink and expand at runtime • Simulations Provided encouraging results • Also intended to be a general purpose queuing system that supports Generic, non-migratable Charm++ and MPI jobs

  3. Adaptive Jobs • Jobs that can dynamically increase (expand) or decrease (shrink) the number of processors they are running on • Motivation • Improve system utilization • Decrease system response time • Properties • minpe, minimum number of processors required for the job, related to the memory requirements of the job • maxpe, maximum number of processors, related to speedup • profit, profit from running the job • deadline, deadline before which the job should be finished

  4. Adaptive Job Example • Consider a 128 processor system • Job A arrives and requests 80 Processors and is started on 80 processors • Job B then arrives and requests 64 processors • In traditional systems Job B will be queued and allocated 64 processors only after Job A finishes, while part of the system remains idle • With adaptive jobs Job A can be shrunk to 64 processors and Job B can be started and after job A finishes Job B can expand and use all the processors

  5. Adaptive Job Scheduler • Adaptive Job Scheduler manages adaptive jobs • Three major components • Job Manager • Accepts jobs, schedules them on the parallel system, and frees resources when the job is done • Scheduling Strategy • A plug-able component that makes decisions on which jobs to schedule • Database • Logs all events that occur in the scheduler and can be used in case of a crash

  6. Adaptive Job Scheduler

  7. Performance Simulation Results on 64 processors with mean job execution time of 64.5 sec for utilization maximizing strategy Experiments on Linux Cluster on 64 processors and mean job execution time of 60 sec λ=Arrival Rate MRT = Mean Response Time Utilization = Processor utilization Load Factor (lf) = Execution Time*λ

  8. Features • Multithreaded for fast response • Logs all job related information to a database • This helps in crash recovery and, • Improves security of the system • Uses Unix sockets for communication • Unix Sockets improve the efficiency of the system • The also restrict access to the scheduler • Provides timed termination of the jobs

  9. Features (continued) • Accepts both batch and interactive jobs • MaxPE and MaxTime are parameters to the system and can be used to restrict unlimited access to the parallel machine • Tested on the cool Linux cluster at PPL • Adaptive jobs currently implemented in Charm++ and MPI • For more details check out http://charm.cs.uiuc.edu/research/faucets/faucets.html

  10. Super-user Access • The Queuing system scheduler runs with super-user privileges • When a new job arrives it is executed with the permissions of the user • The code has been checked for stack overflows • Direct Access to the parallel machine is blocked by removing the permissions for rsh, ssh etc • To start a job the scheduler changes its group id to a Queuing System group which can access the cluster

  11. Queuing System Commands • Similar to current queuing systems • fsub is the command to submit batch jobs to the queuing system • frun runs jobs interactively • fjobs lists the jobs • fkill can be used to kill jobs

  12. Conclusions and Future Work • Queuing system has been tested and is ready to be installed on the Turing cluster • Make the scheduler manage multiple heterogeneous clusters by supporting the concept of queues • Some of the queues could be batch and others interactive • Interactive queues would allocate multiple jobs to the same node depending on the utilization of the nodes • Running the scheduler on SP2 and other multiprocessing architectures • One of the solutions would be to run the faucets scheduler on top of a commercial queuing system

More Related