1 / 39

Online Prediction of the Running Time Of Tasks

Online Prediction of the Running Time Of Tasks. Peter A. Dinda Department of Computer Science Northwestern University http://www.cs.northwestern.edu/~pdinda. Overview. Predict running time of task Application supplies task size (0.1-10 seconds currently)

graceland
Download Presentation

Online Prediction of the Running Time Of Tasks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer ScienceNorthwestern University http://www.cs.northwestern.edu/~pdinda

  2. Overview • Predict running time of task • Application supplies task size (0.1-10 seconds currently) • Task is compute-bound (current limit) • Prediction is a confidence interval • Expresses prediction error • Statistically valid decision-making in scheduler • Based on host load prediction • Homogenous Digital Unix hosts (current limit) • System is portable to many operating systems Everything in talk is publicly available

  3. Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions

  4. A Universal Challenge in High Performance Distributed Applications Highly variable resource availability • Shared resources • No reservations • No globally respected priorities • Competition from other users - “background workload” Running time can vary drastically Adaptationexample goal: soft real-time for interactivity example mechanism: server selection Performance queries

  5. Running Time Advisor (RTA) background workload What will be the running time of this 3 second task if started now? App It will be 5.3 seconds Host nominal time: running time on empty host, task size • Entirely user-leveltool • No reservations or admission control • Query result is aprediction

  6. Variability and Prediction Prediction resource High Resource Availability Variability t Low Prediction Error Variability Predictor resource error t t Characterization of variability ACF t Exchange high resource availability variability for low prediction error variability and a characterization of that variability

  7. Running Time Advisor (RTA) background workload With 95% confidence, what will be the running time of this 3 second task if started now? App It will be 4.1 to 6.3 seconds Host CI captures prediction error to the extentthe application is interested in it Independent of prediction techniques

  8. RTA API

  9. Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions

  10. Host Load Traces • DEC Unix 5 second exponential average • Full bandwidth captured (1 Hz sample rate) • Long durations • http://www.cs.northwestern.edu/~pdinda/LoadTraces

  11. Host Load Properties • Self-similarity • long-range dependence • Epochal behavior • non-stationarity • Complex correlation structure[LCR ’98, Scientific Programming, 3:4, 1999]

  12. Host Load Prediction • Fully randomized study on traces • MEAN, LAST, AR, MA, ARMA, ARIMA, ARFIMA models • AR(16) models most appropriate • Covariance matrix for prediction errors • Low overhead: <1% CPU [HPDC ’99, Cluster Computing, 3:4, 2000]

  13. RPS Toolkit • Extensible toolkit for implementing resource signal prediction systems • Easy “buy-in” for users • C++ and sockets (no threads) • Prebuilt prediction components • Libraries (sensors, time series, communication) • Users have bought in • Incorporated in CMU Remos, BBN QuO [CMU-CS-99-138] http://www.cs.northwestern.edu/~RPS

  14. Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions

  15. A Model of the Unix Scheduler tact = f(tnom, background workload) Nominal running time Task tnom Background workload Unix Scheduler Actual running time Task tact Actual Load <zt>

  16. A Model of the Unix Scheduler Nominal running time Task tnom Background workload Unix Scheduler Predicted running time > Task texp PredictedLoad <zt> > texp = g(tnom,<zt>) = tact + Error

  17. Available Time and Average Load Available time from 0 to t Average load from 0 to t Load Signal – replace with prediction of load signal tact is minimum t where at(t)=tnom Fluid model, Processor Sharing,Idealized Round-Robin, …

  18. Discrete Time • No magic here – this is the obvious discretization • is the sample interval zt+j replaced with prediction

  19. Confidence Intervals > > > > zt+j replaced with zt+j in prediction, giving ali, ati, at(t) > > Confidence interval for at(t) is a CI for ali… prediction errors Since this is a sum, the central limit theorem applies… Then a 95% confidence interval is

  20. The Variance of the Sum • Prediction errors at+j are not independent • Predictor’s covariance matrix captures this Predictor makes it possible to compute this variance and thus the CI Important detail: load discounting

  21. Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions

  22. Experimental Setup • Environment • Alphastation 255s, Digital Unix 4.0 • Workload: host load trace playback [LCR 2000] • Prediction system on each host • AR(16), MEAN, LAST • Tasks • Nominal time ~ U(0.1,10) seconds • Interarrival time ~ U(5,15) seconds • 95 % confidence level • Methodology • Predict CIs • Run task and measure http://www.cs.northwestern.edu/~pdinda/LoadTraces/playload

  23. Metrics • Coverage • Fraction of testcases within confidence interval • Ideally should equal the target 95 % • Span • Average length of confidence interval • Ideally as short as possible • R2 between texp and tact

  24. General Picture of Results • Five classes of behavior • I’ll show you two • RTA Works • Coverage near 95% in most cases is possible • Predictor quality matters • Better predictors lead to smaller spans on lightly loaded hosts and to correct coverage on heavily loaded hosts • AR(16) >= LAST >= MEAN • Performance is slightly dependent on nominal time

  25. Most Common Coverage Behavior

  26. Most Common Span Behavior

  27. Uncommon Coverage Behavior

  28. Uncommon Span Behavior

  29. Related Work • Distributed interactive applications • QuakeViz/ Dv, Aeschlimann [PDPTA’99] • Quality of service • QuO, Zinky, Bakken, Schantz [TPOS, April 97] • QRAM, Rajkumar, et al [RTSS’97] • Distributed soft real-time systems • Lawrence, Jensen [assorted] • Workload studies for load balancing • Mutka, et al [PerfEval ‘91] • Harchol-Balter, et al [SIGMETRICS ‘96] • Resource signal measurement systems • Remos [HPDC’98] • Network Weather Service [HPDC‘97, HPDC’99] • Host load prediction • Wolski, et al [HPDC’99] (NWS) • Samadani, et al [PODC’95] • Hailperin [‘93] • Application-level scheduling • Berman, et al [HPDC’96] • Stochastic Scheduling, Schopf [Supercomputing ‘99]

  30. Conclusions • Predict running time of compute-bound task • Based on host load prediction • Prediction is a confidence interval • Confidence interval algorithm • Covariance matrix • Load discounting • Effective for domain • Digital Unix, 0.1-10 second tasks, 5-15 second interarrival • Extensions in progress

  31. For More Information • All software and traces are available • RPS + RTA + RTSA http://www.cs.northwestern.edu/~RPS • Load Traces and playbackhttp://www.cs.northwestern.edu/~pdinda/LoadTraces • Prescience Lab • Peter Dinda, Jason Skicewicz, Dong Lu • http://www.cs.northwestern.edu/~plab

  32. Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions

  33. A Universal Problem Which host should the application send the task to so that its running time is appropriate? ? Task Example: Real-time Known resource requirements What will the running time be if I...

  34. Running Time Advisor Predicted Running Time Application notifies advisor of task’s computational requirements (nominal time) Advisor predicts running time on each host Application assigns task to most appropriate host ? Task nominal time

  35. Real-time Scheduling Advisor Application specifies task’s computational requirements (nominal time) and its deadline Advisor acquires predicted task running times for all hosts Advisor chooses one of the hosts where the deadline can be met Predicted Running Time deadline ? Task nominal time deadline

  36. Confidence Intervals to Characterize Variability “3 to 5 seconds with 95% confidence” Application specifies confidence level (e.g., 95%) Running time advisor predicts running times as a confidence interval (CI) Real-time scheduling advisor chooses host where CI is less than deadline CI captures variability to the extent the application is interested in it Predicted Running Time deadline ? Task nominal time deadline 95% confidence

  37. Prototype System This Paper

  38. Load Discounting Motivation • I/O priority boost • Short tasks less effected by load

  39. Load Discounting • Apply before using load predictions • tdiscount is estimatable machine property

More Related