Job Scheduling

Job Scheduling P. (Saday) Sadayappan Ohio State University

Problem Statement • Given a stream of parallel jobs and a set of computing resources, determine when and where to execute each job • In the form that the job scheduling problem is addressed at most supercomputer centers: • Homogeneous set of processors • Each job asks for a specific, fixed number of processors

Job Scheduling Today • Earliest job schedulers (Intel iPSC) used a simple FCFS strategy; low utilization (50%) • Back-filling was implemented at Argonne • Give an earliest-possible reservation to job at head of the queue, but allow a later arriving job to bypass it, if the reservation is not violated • Utilization improves to ~90% • Used at most production facilities today

Can Performance be Improved? • Metrics: • System Metric: Utilization • User Metrics: Response time (wait+run time), Slowdown (response-time/run-time) • Over a hundred papers published: • Focus mainly on improving user metrics: much greater potential for its improvement than utilization • Question: How important is it to squeeze an additional 5-10% utilization on a system that is already achieving over 85% utilization?

Improving Response Time • Question: How important is it to evaluate alternatives to standard back-fill scheduling, with a goal of improved user response-time? • Many studies have reported simulation studies showing significant improvement of slowdown or response-time with new schemes; but most production schedulers simply use aggressive back-fill. Why?

Possible Reasons for Non-Adoption • Academic studies do not model specific policy issues of a center, e.g. “good citizen rules,” multiple queues etc. • Most results are based on job log traces at Feitelson’s archive, with many logs from academic centers exhibiting low system utilization (< 70%). • Most studies report overall averages over entire trace: insufficient to assess impact of change: • E.g., using a Shortest-Job-First queue policy instead of the usual FCFS policy significantly improves overall average slowdown by a factor of 4; but increases response time for 24 hour jobs to 50 hours instead of 26 hours.

QoS for Job Scheduling • Job schedulers do not provide QoS: • No response time guarantees • No equitable way of offering different service for urgent versus non-urgent jobs • Technical and Accounting issues: • Develop job schedulers that can do deadline-based scheduling • Develop accounting models to charge based on urgency of job: • Charge = f1(resource-usage) + f2(wait-time-limit) • Question: How desirable is it to develop job schedulers with QoS functionality?

Questions?

Job Scheduling

Job Scheduling

Presentation Transcript

Quartz Job Scheduling Framework

Job Scheduling in SAP

Regret Minimization and Job Scheduling

Job-shop Scheduling

Job Scheduling on Amazon EC2

Job Shop Scheduling

Job Analysis & Scheduling

Temperature-Aware Job Scheduling

Power-Aware Parallel Job Scheduling

Approximation Algorithms: Job Scheduling

Online Job Scheduling

Job Scheduling

Job Shop Scheduling

Job Scheduling and Chronobot

Job scheduling

Job Shop Scheduling

JOB SHOP SCHEDULING

Job Scheduling for MapReduce

Job Scheduling Software

Job Scheduling and Chronobot

Job Shop Scheduling

Job Scheduling services | fieldz

Job Scheduling

Job Scheduling

Presentation Transcript

Quartz Job Scheduling Framework

Job Scheduling in SAP

Regret Minimization and Job Scheduling

Job-shop Scheduling

Job Scheduling on Amazon EC2

Job Shop Scheduling

Job Analysis &amp; Scheduling

Temperature-Aware Job Scheduling

Power-Aware Parallel Job Scheduling

Approximation Algorithms: Job Scheduling

Online Job Scheduling

Job Scheduling

Job Shop Scheduling

Job Scheduling and Chronobot

Job scheduling

Job Shop Scheduling

JOB SHOP SCHEDULING

Job Scheduling for MapReduce

Job Scheduling Software

Job Scheduling and Chronobot

Job Shop Scheduling

Job Scheduling services | fieldz

Job Analysis & Scheduling