230 likes | 416 Views
A Prediction-based Real-time Scheduling Advisor. Peter A. Dinda Carnegie Mellon University. Outline. Real-time scheduling advisor model and interface Prediction-based implementation Randomized evaluation using load trace playback. The Problem Solved by the Real-time Scheduling Advisor.
E N D
A Prediction-basedReal-time Scheduling Advisor Peter A. Dinda Carnegie Mellon University
Outline • Real-time scheduling advisor model and interface • Prediction-based implementation • Randomized evaluation using load trace playback
The Problem Solved by the Real-time Scheduling Advisor At time tnow, the application gives you a task with compute requirements tnom, a deadline tnow+tnom(1+slack), a confidence level c, and a list of hosts in a shared, unreserved distributed computing environment. The application can run the task on any of the hosts. Choose a host from the list such that the task, if run on that host, will meet the deadline with probability c or better, if possible.
Model • Task model • Compute-bound • Initiated by user actions (interactive applications) • Arrive aperiodically • Do not overlap • Must be started immediately (tnow) • Application model • Knows task’s compute requirements (tnom) • Knows appropriate slack for task • deadline = tnow + (1+slack)tnom • Can run task on one of a set of hosts • Real-time scheduling advisor recommends the most appropriate host
RTSA Interface int RTAdviseTask(RTSchedulingAdvisorRequest &req, RTSchedulingAdvisorResponse &resp); struct RTSchedulingAdvisorRequest { double tnom; double slack; double conf; Host hosts[]; } struct RTSchedulingAdvisorResponse { double tnom; double slack; double conf; Host host; RunningTimePredictionResponse runningtime; } Deadline: tnow + tnom(1+slack) Required certainty of meeting deadline Hosts to choose from Most appropriate host Confidence interval for running time on host
Anchoring this talk This talk: description and evaluation of the real-time scheduling advisor Assume this works (later talk) Built host load prediction system Developed RPS toolkit for building fast, low overhead resource prediction systems Found appropriate predictive models for host load signals Studied statistical properties of host load signals Developed load trace playback technique for reconstructing load
Scheduling Strategies • Prediction-based (MEAN, LAST, AR(16)) • Operation • Acquire running time predictions for each host • Select host at random from those where confidence interval is below deadline • If none exist, choose host with lowest expected running time • Return host and running time prediction • MEASURE • Return host with current lowest measured load • No running time prediction • RANDOM • Return random host • No running time prediction
Performance Metrics • Fraction of deadlines met • “Will the deadline be met?” • Depends on (at least) strategy, slack, and resource availability • Fraction of deadlines met when possible • “If strategy claims deadline will be met, will the deadline be met? • Should depend only on strategy • Application can try other tnom, slack • Number of possible hosts • “How much randomness is introduced?” • Helps to avoid disastrous advisor synchronization
Methodology • Recreate “scenario” (load on a set of hosts) on manchester testbed using load trace playback • Schedule and run randomized tasks • random arrival times (5 to 15 seconds apart) • tnom randomly selected from 0.1 to 10 secs • Slack randomly selected from 0 to 2 • Randomly selected strategy • Data-mine results
4LS Scenario • Four PSC alpha cluster hosts • axp0 (interactive), axp4, axp5, axp10 (batch) • high load, high variability • Traces start Tuesday, August 12, 1997. • 16,000 tasks run in 36 hours
Terminology I will Use • Scheduling feasibility • How likely it is that a host exists on which deadline can be met • Increases with slack, decreases with tnom • Also depend on variation among the hosts • Predictor sensitivity • How likely that the deadline will be missed due to a bad prediction • Low when scheduling feasibility is high or low • Highest near critical slack • Critical slack • Slack at which scheduling feasibility is 50%
Overview of Results • AR(16) prediction-based strategy is superior • Fraction of deadlines met at least as good as MEASURE, and much improved at critical slack • Fraction of deadlines met when possible higher than all competitors and most independent of slack and nominal time • Introduces similar randomness as other prediction-based strategies • Performance metrics depend slack, nominal time
Fraction of Deadlines Met When Possible Versus tnom (Near Critical Slack)
Conclusions • MEASURE greatly increases chance of meeting deadlines compared to RANDOM • AR(16) increases that chance with miniscule additional overhead • Especially near critical slack and for short tasks • In addition, AR(16) can tell the application, with high accuracy, whether the deadline will be met before the task is run • Gives the application opportunity to negotiate • AR(16) introduces appropriate randomness into their choices, reducing chance of conflict • AR(16) Prediction-based Real-time Scheduling Advisor is a useful tool