1 / 1

Scheduling of parallel jobs in a heterogeneous grid environment

Problem Addressed. Simulation Environment. Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous cluster of processors, but processors at different sites have different speeds

miyoko
Download Presentation

Scheduling of parallel jobs in a heterogeneous grid environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problem Addressed Simulation Environment • Scheduling of parallel jobs in a heterogeneous grid environment • Each site has a homogeneous cluster of processors, but processors at different sites have different speeds • Much of the research on scheduling in heterogeneous systems has focussed on independent sequential jobs • Research on parallel job scheduling has concentrated primarily on the homogeneous context • The algorithms used for scheduling sequential tasks on heterogeneous systems are too computationally complex to extend to parallel jobs • We extend the techniques used for parallel job scheduling in a homogeneous context to the heterogeneous context • Heterogeneous sites, with a homogeneous cluster of processors at each site • 5000 job subset of either the 430 node Cornell Theory Center (CTC) trace or the 128 node IBM SP2 system at the San Diego Supercomputer Center (SDSC) • NAS Parallel Benchmarks 2.0 used to model the heterogeneous runtimes SGI Origin 2000 IBM SP (WN/66) Cray T3E 900 IBM SP (P2SC 160 MHz) IS Class B (8 Nodes) 23.3 22.6 16.3* 17.7 MG Class B (8 Nodes) 35.5 34.3 25.3 17.2* MG Class B (256 Nodes) 1.3147 2.2724 1.8 1.1* Conservative vs. Arrgessive LU Class B (256 Nodes) 20.328* 94.893 35.6 24.2 * Denotes best runtime for the job Restricted Multi-Site Reservations • Jobs are processed in arrival order by the meta-scheduler • Greedy assigns each job to the site with the lowest instantaneous load • Greedy-MR (Multiple Requests) submits each job to all sites • When the job starts at a site, the other instances are removed • We have shown this mechanism to be effective in a homogenous context (HPDC ’02) • However, only a slight improvement is seen in a heterogeneous context • When network bandwidth is limited, jobs can be submitted to a smaller number of sites and the multi-reservation scheduler can still realize a substantial fraction of the benefits achievable from a scheduler that schedules each job at all sites, by making fewer reservations • Use a more accurate approach to select which sites the job is submitted, instead of using the instantaneous load, we query the site to determine the earliest completion time • When using completion time as the criteria, submitting to fewer sites can be almost as effective as submitting to all sites Conclusions and Future Work Efficacy Based Queues • Improvement in turn around time and effective utilization for parallel job scheduling in a heterogeneous grid environment have been demonstrated through simulation • Next Steps: • Incorporate these changes into the Silver/Maui Scheduler • Deploy and evaluate the scheduler first on our research clusters, and then at the Ohio Super Computer Center • Explicitly take into account efficacy to improve the effective utilization • Use efficacy as the priority order for the jobs in the reserved and idle queue • Starvation free • Effective utilization increases and turn around time decreases, in spite of the decreases in raw utilization A Characterization of Approaches to Parrallel Job Scheduling Gerald Sabin Rajkumar Kettimuthu Arun Rajan P SadayappanSupported in part by Sandia National Laboratory Metrics • We use the following metrics for evaluating the proposed schemes • Average Slowdown • Average Turnaround Time • Utilization • Effective Utilization Backfilling • Backfilling • A later arriving job is allowed to leap frog previously queued jobs Aggressive vs. Conservative Processors Processors Time Time Processors Time • Jobs are processed in arrival order by the meta-scheduler • In a heterogeneous context, the site where the job starts the earliest may not be the best site • In order to get the completion of a job at a particular site, conservative backfilling has to be employed at the local site • Conservative performs better than aggressive in all case, quite the opposite of a homogenous context • Improved backfilling caused by holes created due to the dynamic removal of replicated jobs at each site, and an increased number of jobs to attempt to backfill at each site • Conservative • Every job is given a reservation when it enters the system and a job is allowed to backfill only if it does not violate any of the previous reservations. • EASY • Only the job at the head of the queue is given a reservation and a job is allowed to backfill if it does not violate this reservation

More Related