80 likes | 265 Views
Job Scheduling. P. (Saday) Sadayappan Ohio State University. Problem Statement. Given a stream of parallel jobs and a set of computing resources, determine when and where to execute each job In the form that the job scheduling problem is addressed at most supercomputer centers:
E N D
Job Scheduling P. (Saday) Sadayappan Ohio State University
Problem Statement • Given a stream of parallel jobs and a set of computing resources, determine when and where to execute each job • In the form that the job scheduling problem is addressed at most supercomputer centers: • Homogeneous set of processors • Each job asks for a specific, fixed number of processors
Job Scheduling Today • Earliest job schedulers (Intel iPSC) used a simple FCFS strategy; low utilization (50%) • Back-filling was implemented at Argonne • Give an earliest-possible reservation to job at head of the queue, but allow a later arriving job to bypass it, if the reservation is not violated • Utilization improves to ~90% • Used at most production facilities today
Can Performance be Improved? • Metrics: • System Metric: Utilization • User Metrics: Response time (wait+run time), Slowdown (response-time/run-time) • Over a hundred papers published: • Focus mainly on improving user metrics: much greater potential for its improvement than utilization • Question: How important is it to squeeze an additional 5-10% utilization on a system that is already achieving over 85% utilization?
Improving Response Time • Question: How important is it to evaluate alternatives to standard back-fill scheduling, with a goal of improved user response-time? • Many studies have reported simulation studies showing significant improvement of slowdown or response-time with new schemes; but most production schedulers simply use aggressive back-fill. Why?
Possible Reasons for Non-Adoption • Academic studies do not model specific policy issues of a center, e.g. “good citizen rules,” multiple queues etc. • Most results are based on job log traces at Feitelson’s archive, with many logs from academic centers exhibiting low system utilization (< 70%). • Most studies report overall averages over entire trace: insufficient to assess impact of change: • E.g., using a Shortest-Job-First queue policy instead of the usual FCFS policy significantly improves overall average slowdown by a factor of 4; but increases response time for 24 hour jobs to 50 hours instead of 26 hours.
QoS for Job Scheduling • Job schedulers do not provide QoS: • No response time guarantees • No equitable way of offering different service for urgent versus non-urgent jobs • Technical and Accounting issues: • Develop job schedulers that can do deadline-based scheduling • Develop accounting models to charge based on urgency of job: • Charge = f1(resource-usage) + f2(wait-time-limit) • Question: How desirable is it to develop job schedulers with QoS functionality?