240 likes | 346 Views
Understanding and Predicting Host Load. Peter A. Dinda Carnegie Mellon University http://www.cs.cmu.edu/~pdinda. Talk in a Nutshell. Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction.
E N D
Understanding and PredictingHost Load Peter A. Dinda Carnegie Mellon University http://www.cs.cmu.edu/~pdinda
Talk in a Nutshell Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction • Load is self-similar • Load exhibits epochal behavior • Load prediction benefits from capturing self-similarity
Why Study Load? ? [tmin,tmax] Load partially determines execution time We want to model and predict load Interactive Application Short tasks with deadlines Unmodified Distributed System
Outline • Measurement methodology • Load traces • Load variance • New Results • Self-similarity • Epochal behavior • Benefits of capturing self similarity in linear models • Conclusions
Measurement Methodology Digital Unix Kernel User Level Measurement Tool RUN T=2 seconds Ready Queue lent lent-T Exponential Average (1 minute Load “Average”) lent-2T avgt ... lent-29T avgt-0.5T Our Measurements (1 Hz sample rate) lent-30T avgt-T ... ...
Load Time Autocorrelation Lag Periodogram Frequency
Why is Self-Similarity Important? • Complex structure • Not completely random, nor independent • Short range dependence • Excellent for history-based prediction • Long range dependence • Possibly a problem • Modeling Implications • Suggests models that can capture • ARFIMA, FGN, TAR
Why is Epochal Behavior Important? • Complex structure • Non-stationary • Modeling Implications • Suggests models • ARIMA, ARFIMA, etc. • Non-parametric spectral methods • Suggests problem decomposition
Linear Time Series Models Unpredictable Random Sequence Partially Predictable Load Sequence Fixed Linear Filter Choose weightsyj to minimize sa2 sa is the confidence interval for t+1 predictions
Realizable Pole-Zero Models ARFIMA(p,d,q) Self Similarity, d related to Hurst ARIMA(p,d,q) Non-stationarity, d integer ARMA(p,q) AR(p) MA(q) p,q are numbers of parameters d is degree of differencing
Real World Benefits of Models sa is the confidence interval for t+1 predictions Map work that would take 100 ms at zero load axp0: sz=0.54, m=1.0, sa(ARMA(4,4))= 0.109 sa(ARFIMA(4,d,4))= 0.108 no model: 1.0 +/- 1.06 (95%) => 100 to 306 ms ARMA: 1.0 +/- 0.22 (95%) => 178 to 222 ms ARFIMA: 1.0 +/- 0.21 (95%) => 179 to 221 ms axp7: sz=0.14, m=0.12, sa(ARMA(4,4))= 0.041 sa(ARFIMA(4,d,4))= 0.025 no model: 0.12 +/- 0.27 (95%) => 100 to 139 ms ARMA: 0.12 +/- 0.08 (95%) => 104 to 120 ms ARFIMA: 0.12 +/- 0.05 (95%) => 107 to 117 ms 1 % 40 %
Conclusions • Load has high variance • Load is self-similar • Load exhibits epochal behavior • Capturing self-similarity in linear time series models improves predictability
Load Traces • Would a web-accessible load trace database be useful? • Would you like to contribute?