1 / 49

Peter A. Dinda Carnegie Mellon University cs.cmu/~pdinda

This resource provides a signal-based approach to predict the running time of compute-bound tasks and offers adaptation advice and host selection to meet soft real-time deadlines.

Download Presentation

Peter A. Dinda Carnegie Mellon University cs.cmu/~pdinda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Running Time AdvisorA Resource Signal-based Approach to Predicting Task Running Time and Its Applications Peter A. Dinda Carnegie Mellon University http://www.cs.cmu.edu/~pdinda

  2. High Level Goals Build systems that use statistics to help distributed applications adapt to highly variable resource availability Focus on information • Application-level performance predictions • Running time of compute-bound tasks • Adaptation advice • Host selection to meet soft real-time deadline • Resource signal approach • Host load signals This Talk

  3. Outline • Bird’s eye view • Adapting to highly variable resource availability • Dv/QuakeViz • Real-time scheduling advisor • Running time advisor • Confidence intervals • Performance results (feasible, practical, useful) • Prototype system • Host load prediction • Traces, structure, linear models, evaluation • RPS Toolkit • Conclusion

  4. A Universal Challenge in High Performance Distributed Applications Highly variable resource availability • Shared resources • No reservations • No globally respected priorities • Competition from other users - “background workload” Running time can vary drastically Adaptation

  5. A Universal Problem Which host should the application send the task to so that its running time is appropriate? ? Task Known resource requirements What will the running time be if I...

  6. DV Framework For Distributed Interactive Visualization • Large datasets (e.g., earthquake simulations) • Distributed VTK visualization pipelines • Active frames • Encapsulate data, computation, path through pipeline • Launched from server by user interaction • Annotated with deadline • Dynamically chose on which host each pipeline stage will execute and what quality settings to use http://www.cs.cmu.edu/~dv

  7. Example DV Pipeline for QuakeViz local display and user Logical View resolution contours ROI interpolation isosurface extraction Simulation Output reading rendering scene synthesis interpolation morphology reconstruction Physical View interpolation isosurface extraction scene synthesis deadline deadline deadline Active Frame n+2 Active Frame n+1 Active Frame n ? ? ?

  8. Real-time Scheduling Advisor • Distributed interactive applications • Examples: CMU Dv/QuakeViz, BBN OpenMap • Assumptions • Sequential tasks initiated by user actions • Aperiodic arrivals • Resilient deadlines (soft real-time) • Compute-bound tasks • Known computational requirements • Best-effort semantics • Recommend host where deadline is likely to be met • Predict running time on that host • No guarantees

  9. Running Time Advisor Predicted Running Time Application notifies advisor of task’s computational requirements (nominal time) Advisor predicts running time on each host Application assigns task to most appropriate host ? Task nominal time

  10. Real-time Scheduling Advisor Application notifies advisor of task’s computational requirements (nominal time) and its deadline Advisor acquires predicted task running times for all hosts Advisor recommends one of the hosts where the deadline can be met Predicted Running Time deadline ? Task nominal time deadline

  11. Variability and Prediction Prediction resource High Resource Availability Variability t Low Prediction Error Variability Predictor resource error t t Characterization of variability ACF t Exchange high resource availability variability for low prediction error variability and a characterization of that variability

  12. Confidence Intervals to Characterize Variability “3 to 5 seconds with 95% confidence” Application specifies confidence level (e.g., 95%) Running time advisor predicts running times as a confidence interval (CI) Real-time scheduling advisor chooses host where CI is less than deadline CI captures variability to the extent the application is interested in it Predicted Running Time deadline ? Task nominal time deadline 95% confidence

  13. Confidence Intervals And Predictor Quality Bad Predictor No obvious choice Good Predictor Two good choices Predicted Running Time Predicted Running Time deadline Good predictors provide smaller CIs Smaller CIs simplify scheduling decisions

  14. Overview of Research Results • Predicting CIs is feasible • Host load prediction using AR(16) models • Running time estimation using host load predictions • Predicting CIs is practical • RPS Toolkit (inc. in CMU Remos, BBN QuO) • Extremely low-overhead online system • Predicting CIs is useful • Performance of real-time scheduling advisor Measured performance of real system Statistically rigorous analysis and evaluation

  15. Experimental Setup • Environment • Alphastation 255s, Digital Unix 4.0 • Workload: host load trace playback • Prediction system on each host • Tasks • Nominal time ~ U(0.1,10) seconds • Interarrival time ~ U(5,15) seconds • Methodology • Predict CIs / Host recommendations • Run task and measure

  16. Predicting CIs is Feasible Near-perfect CIs on typical hosts 3000 randomized tasks

  17. Predicting CIs is Practical - RPS System <2% of CPU At Appropriate Rate 1-2 ms latency from measurement to prediction 2KB/sec transfer rate

  18. Predicting CIs is Useful - Real-time Scheduling Advisor Host With Lowest Load Predicted CI < Deadline Random Host 16000 tasks

  19. Predicting CIs is Useful - Real-time Scheduling Advisor Predicted CI < Deadline Host With Lowest Load Random Host 16000 tasks

  20. Outline • Bird’s eye view • Adapting to highly variable resource availability • Dv/QuakeViz • Real-time scheduling advisor • Running time advisor • Confidence intervals • Performance results (feasible, practical, useful) • Prototype system • Host load prediction • Traces, structure, linear models, evaluation • RPS Toolkit • Conclusion

  21. Design Space Can the gap between the resources and the application can be spanned? yes!

  22. Resource Signals • Characteristics • Easily measured, time-varying scalar quantities • Strongly correlated with resource availability • Periodically sampled (discrete-time signal) • Examples • Host load (Digital Unix 5 second load average) • Network flow bandwidth and latency Leverage existing statistical signal analysis and prediction techniques

  23. RPS Toolkit • Extensible toolkit for implementing resource signal prediction systems • Easy “buy-in” for users • C++ and sockets (no threads) • Prebuilt prediction components • Libraries (sensors, time series, communication) • Users have bought in • Incorporated in CMU Remos, BBN QuO • Research users: Bruce Lowekamp, Nancy Miller, LeMonte Green http://www.cs.cmu.edu/~pdinda/RPS.html

  24. Prototype System RPS components can be composed in other ways

  25. Host load on real hosts has exploitable structure Strong autocorrelation, self-similarity, epochal behavior Trace database and host load trace playback Host load is predictable using simple linear models Recommendation: AR(16) models or better for 1-30 sec predictions RPS Toolkit for low overhead systems (<2% of CPU) C++, ported to 5 OSes, incorporated in CMU Remos, BBN QuO Running time CIs can be computed from load predictions Load discounting, error covariances Effective real-time scheduling advice can be based on CIs Know if deadline will be met before running task Research Results

  26. Outline • Bird’s eye view • Adapting to Highly variable resource availability • Dv/QuakeViz • Real-time scheduling advisor • Running time advisor • Confidence intervals • Performance results (feasible, practical, useful) • Prototype system • Host load prediction • Traces, structure, linear models, evaluation • RPS Toolkit • Conclusion

  27. Questions • What are the properties of host load? • Is host load predictable? • What predictive models are appropriate? • Are host load predictions useful?

  28. Overview of Answers • Host load exhibits complex behavior • Strong autocorrelation, self-similarity, epochal behavior • Host load is predictable • 1 to 30 second timeframe • Simple linear models are sufficient • Recommend AR(16) or better • Predictions are useful • Can compute effective CIs from them

  29. Host Load Traces • DEC Unix 5 second exponential average • Full bandwidth captured (1 Hz sample rate) • Long durations

  30. If Host Load Was “Random” (White Noise)... Time domain Autocorrelation Frequency domain Spectrogram

  31. Host Load Has Exploitable Structure Time domain Autocorrelation Frequency domain Spectrogram

  32. Linear Time Series Models Pole-zero / state-space models capture autocorrelation parsimoniously (2000 sample fits, largest models in study, 30 secs ahead)

  33. Evaluation Methodology • Ran ~190,000 randomly chosen testcases on the traces • Evaluate models independently of prediction/evaluation framework • No monitoring • ~30 testcases per trace, model class, parameter set • Data-mine results Offline and online systems implemented using RPS Toolkit

  34. Testcases • Models • MEAN, LAST/BM(32) • Randomly chosen model from: AR(1..32), MA(1..8), ARMA(1..8,1..8), ARIMA(1..8,1..2,1..8), ARFIMA(1..8,d,1..8)

  35. Evaluating a Testcase Measurements in Fit Interval Model Type <zt-m,...,zt-2 ,zt-1> Modeler z’t+2,t+2+w z’t+1,t+1+w z’t,t+w ... Model ... ... ... z’t+2,t+4 z’t+1,t+3 Measurements in Test Interval z’t,t+2 ... z’t+2,t+3 z’t+1,t+2 Load Predictor z’t,t+1 ... zt+n-1,…,zt+1 ,zt Prediction Stream Error Estimates Characterization of variation Evaluator One-time use Measurement of variation Production Stream Error Metrics

  36. (z’t+i,t+i+w - zt+i+w)2 (z’t+i,t+i+2 - zt+i+2 )2 Measured Prediction Variance: Mean Squared Error z’t+2,t+2+w z’t+1,t+1+w z’t,t+w ... w step ahead predictions ... ... ... ... Load Predictor z’t+2,t+4 z’t+1,t+3 z’t,t+2 …,zt+1 ,zt ... 2 step ahead predictions z’t+2,t+3 z’t+1,t+2 z’t,t+1 ... 1 step ahead predictions s2z = (m - zt+i)2 Variance of z s2aw= w step ahead mean squared error ... ... s2a2= 2 step ahead mean squared error (z’t+i,t+i+1 - zt+i+1 )2 s2a1= 1 step ahead mean squared error Good Load Predictor :s2a1,s2a2 ,…,s2aw << s2z

  37. Unpaired Box Plot Comparisons Inconsistent low error Consistent high error 97.5% Mean Squared Error 75% Consistent low error Mean 50% 25% Model A Model B Model C 2.5% Good models achieve consistently low error

  38. 1 second Predictions, All Hosts 97.5% 75% Mean 50% 25% 2.5% Predictive models clearly worthwhile

  39. 30 second Predictions, All Hosts 97.5% 75% Mean 50% 25% 2.5% Predictive models clearly beneficial even at long prediction horizons

  40. 30 Second Predictions, High Load, Dynamic Host 97.5% 75% Mean 50% 25% 2.5% Predictive models clearly worthwhile Begin to see differentiation between models

  41. Outline • Bird’s eye view • Adapting to highly variable resource availability • Dv/QuakeViz • Real-time scheduling advisor • Running time advisor • Confidence intervals • Performance results (feasible, practical, useful) • Prototype system • Host load prediction • Traces, structure, linear models, evaluation • RPS Toolkit • Conclusion

  42. Related Work • Distributed interactive applications • QuakeViz/ Dv, Aeschlimann [PDPTA’99] • Quality of service • QuO, Zinky, Bakken, Schantz [TPOS, April 97] • QRAM, Rajkumar, et al [RTSS’97] • Distributed soft real-time systems • Lawrence, Jensen [assorted] • Workload studies for load balancing • Mutka, et al [PerfEval ‘91] • Harchol-Balter, et al [SIGMETRICS ‘96] • Resource signal measurement systems • Remos [HPDC’98] • Network Weather Service [HPDC‘97, HPDC’99] • Host load prediction • Wolski, et al [HPDC’99] (NWS) • Samadani, et al [PODC’95] • Hailperin [‘93] • Application-level scheduling • Berman, et al [HPDC’96] • Stochastic Scheduling, Schopf [Supercomputing ‘99]

  43. Conclusions • Help applications adapt tohighly variable resource availability • Resource signal prediction • Predict running times as confidence intervals • Predicting CIs is feasible • Host load prediction using AR(16) models • Running time estimation using host load predictions • Predicting CIs is practical • RPS Toolkit (inc. in CMU Remos, BBN QuO) • Extremely low-overhead online system • Predicting CIs is useful • Performance of real-time scheduling advisor

  44. Future Work • New resource signals • Network bandwidth and latency (Remos) • New prediction approaches • Wavelets, nonlinearity, cointegration • Resource scheduler models • Better Unix scheduler model • Network models • Adaptation advisors • Applications and workloads • DV/QuakeViz, GIMP, Instrumentation

  45. Tools/Venues for Future work • Resource signal methodolgy • RPS Toolkit • Remos • QuakeViz/DV • Grid Forum

  46. Future Work (Long Term) • Experimental computer science research • Application-oriented view • Measurement studies and analysis • Statistical approach • Application services • Systems building systems X applications X statistics

  47. Teaching • “Signals, systems, and statistics for computer scientists” • “Performance data analysis” • “Introduction to computer systems”

  48. Response of Typical AR(16)

  49. Response of AR(1024)

More Related