A Prediction-based Real-time Scheduling Advisor

A Prediction-basedReal-time Scheduling Advisor Peter A. Dinda Carnegie Mellon University

Outline • Real-time scheduling advisor model and interface • Prediction-based implementation • Randomized evaluation using load trace playback

The Problem Solved by the Real-time Scheduling Advisor At time tnow, the application gives you a task with compute requirements tnom, a deadline tnow+tnom(1+slack), a confidence level c, and a list of hosts in a shared, unreserved distributed computing environment. The application can run the task on any of the hosts. Choose a host from the list such that the task, if run on that host, will meet the deadline with probability c or better, if possible.

Model • Task model • Compute-bound • Initiated by user actions (interactive applications) • Arrive aperiodically • Do not overlap • Must be started immediately (tnow) • Application model • Knows task’s compute requirements (tnom) • Knows appropriate slack for task • deadline = tnow + (1+slack)tnom • Can run task on one of a set of hosts • Real-time scheduling advisor recommends the most appropriate host

RTSA Interface int RTAdviseTask(RTSchedulingAdvisorRequest &req, RTSchedulingAdvisorResponse &resp); struct RTSchedulingAdvisorRequest { double tnom; double slack; double conf; Host hosts[]; } struct RTSchedulingAdvisorResponse { double tnom; double slack; double conf; Host host; RunningTimePredictionResponse runningtime; } Deadline: tnow + tnom(1+slack) Required certainty of meeting deadline Hosts to choose from Most appropriate host Confidence interval for running time on host

Prediction-based Implementation

Anchoring this talk This talk: description and evaluation of the real-time scheduling advisor Assume this works (later talk) Built host load prediction system Developed RPS toolkit for building fast, low overhead resource prediction systems Found appropriate predictive models for host load signals Studied statistical properties of host load signals Developed load trace playback technique for reconstructing load

Scheduling Strategies • Prediction-based (MEAN, LAST, AR(16)) • Operation • Acquire running time predictions for each host • Select host at random from those where confidence interval is below deadline • If none exist, choose host with lowest expected running time • Return host and running time prediction • MEASURE • Return host with current lowest measured load • No running time prediction • RANDOM • Return random host • No running time prediction

Performance Metrics • Fraction of deadlines met • “Will the deadline be met?” • Depends on (at least) strategy, slack, and resource availability • Fraction of deadlines met when possible • “If strategy claims deadline will be met, will the deadline be met? • Should depend only on strategy • Application can try other tnom, slack • Number of possible hosts • “How much randomness is introduced?” • Helps to avoid disastrous advisor synchronization

Methodology • Recreate “scenario” (load on a set of hosts) on manchester testbed using load trace playback • Schedule and run randomized tasks • random arrival times (5 to 15 seconds apart) • tnom randomly selected from 0.1 to 10 secs • Slack randomly selected from 0 to 2 • Randomly selected strategy • Data-mine results

4LS Scenario • Four PSC alpha cluster hosts • axp0 (interactive), axp4, axp5, axp10 (batch) • high load, high variability • Traces start Tuesday, August 12, 1997. • 16,000 tasks run in 36 hours

Terminology I will Use • Scheduling feasibility • How likely it is that a host exists on which deadline can be met • Increases with slack, decreases with tnom • Also depend on variation among the hosts • Predictor sensitivity • How likely that the deadline will be missed due to a bad prediction • Low when scheduling feasibility is high or low • Highest near critical slack • Critical slack • Slack at which scheduling feasibility is 50%

Overview of Results • AR(16) prediction-based strategy is superior • Fraction of deadlines met at least as good as MEASURE, and much improved at critical slack • Fraction of deadlines met when possible higher than all competitors and most independent of slack and nominal time • Introduces similar randomness as other prediction-based strategies • Performance metrics depend slack, nominal time

Fraction of Deadlines Met Versus Slack

Fraction of Deadlines Met Versus tnom

Fraction of Deadlines Met Versus tnom(near critical slack)

Fraction of Deadlines Met When Possible Versus Slack

Fraction of Deadlines Met When Possible Versus tnom

Fraction of Deadlines Met When Possible Versus tnom (Near Critical Slack)

Number of Possible Hosts Versus Slack

Number of Possible Hosts Versus tnom

Number of Possible Hosts Versus tnom (Near Critical Slack)

Conclusions • MEASURE greatly increases chance of meeting deadlines compared to RANDOM • AR(16) increases that chance with miniscule additional overhead • Especially near critical slack and for short tasks • In addition, AR(16) can tell the application, with high accuracy, whether the deadline will be met before the task is run • Gives the application opportunity to negotiate • AR(16) introduces appropriate randomness into their choices, reducing chance of conflict • AR(16) Prediction-based Real-time Scheduling Advisor is a useful tool

A Prediction-based Real-time Scheduling Advisor

A Prediction-based Real-time Scheduling Advisor

Presentation Transcript

Real-Time Task Scheduling

The Case For Prediction-based Best-effort Real-time

Real-Time Scheduling

Introduction to Real-Time Scheduling

Real Time Scheduling

Feedback Control Real-time Scheduling

Multiprocessor Real-time Scheduling

Real-Time Systems Scheduling Tool

Real-Time Scheduling

Real Time Scheduling

Real- Time Scheduling II : Compositional Scheduling Framework

Real-Time Scheduling

Lecture 6: Real-Time Scheduling

Real-Time Scheduling

Integrated Real-Time Resource Scheduling

Real-Time Scheduling

Multiprocessor Real-Time Scheduling

Real-Time Scheduling [ Chapter 5.5]

Real Time Scheduling

Real time scheduling