250 likes | 258 Views
On applying pattern recognition to systems management. Moises Goldszmidt. High complexity of current/future systems. Expensive to come up with a closed form characterization of Behavior Interrelationship between components Dynamic nature of Workload/inputs
E N D
On applying pattern recognition to systems management Moises Goldszmidt
High complexity of current/future systems • Expensive to come up with a closed form characterization of • Behavior • Interrelationship between components • Dynamic nature of • Workload/inputs • Infrastructure (software/hardware) • Opacity • Layers of abstraction (virtualization) • OEMS
Expensive closed form characterization Dynamics Opacity Cheap automatic characterization Adaptation Induction of mappings and estimation of state A proposal… Raw data Features P(rt|x) Decisions ObserveSystem InduceModels PerformInferences
Issues… • (Automatic) evaluation of models • Accuracy • Percentage of patterns captured • False positives vs false negatives • Decision making power • Uncertainty and confidence • Calibration • Amount of data • Decisions about model parameterization, tradeoffs between complexity and computation, overfitting and generalization • Uncertainty!
Hope… • Advances in datamining, machine learning, computational statistics… • Representation • Computation • Computational power • Search • Matrix inversion • Numerical techniques
Inducing models of black box storage arrays • Ira Cohen, Kim Keeton, Terence Kelly • Problem: • Given a trace of I/O response times of an XP512 • A specification for “fast” and “slow” • Forecast the response time (fast or slow) of any individual I/O request • Obstacles: Array is a black box • Applications: • Scheduling – serving compound web pages • Performance monitoring and anomaly detection
Methodology • Collect data (training set) • Induce probabilistic model • Priors • Mixture of regressions (MOR) • Naïve Bayes Classifier (NBC) • Provide decision procedure • Evaluation of the models on “unseen” data
Priors based model • Model = P(rt) • Decision procedure: given threshold for fast • If P(rt < fast) >50% then announce fastotherwise slow • Note: This forecast is constant and is independent of other characteristics of the input • Complexity of algorithms and computation • Trivial
cache simulators other features Several linearrelationshipsdepending oncache MOR Model cache responsetime • Model P(rt|cs,of) = Sc P(rt|of,c)*P(c|cs) • Decision procedure = given threshold for fast - If P(rt > t| cs,of) > 50% then announce fast otherwise announce slow
NBC Model • Induce a model based on the threshold t of fast and slow • Model P(t|cs,of) = Pi P(ofi|t)*Pj P(csj|t)*a • Decision procedure: P(fast|cs,of)> P(slow|cs, of) t cache simulators other features
Evaluating models • Classification power: Did the model + decision procedure captured the patterns accurately? • Accuracy = percentage of correct predictions • Appropriate for anomaly detection • As decision makers: What is the confidence/risk on each decision? • Utility based: Pay according to confidence on each decision • Brier score = Sx (slowx – P(slow|x))2 • Appropriate for scheduling decisions • How much data: • When can we trust the model?
Classifiers as decision makers • Brier Sx (slowx – Pm(slow|x))2 = calibration + refinement • Calibration: if Pm(slow|x) = 10%, then E[ P(slow| Pm(slow|x))] = 10% • Refinement: How close is the forecast to being certain
On being calibrated • We can use the P as a measure of confidence • Refinement establishes a bound on the Bayes error • Accuracy may improve: • Threshold of 50% is optimal for real P • Calibration brings estimates of the model closer to real P • Calibration procedure: (DeGroot) • Map estimated P to trained set P Work with Ira Cohen
After calibration Number Training: 686091, Number Test: 343046. RG3, Days 27-30.
A dialogue…. • Sys: Awesome, lets forecast whether a 3-tier system will meet its SLO – What should I measure? • Pat: Measure everything!! We will then establish a search over the measurements according to one of the different scores • Sys: Can you tell me whether the system will meet the SLO? • Pat: I can tell you the probability that the system will meet the SLO. Uncertainty is a fact of your world. I can provide decision procedures to deal with it • Sys: What happens if the workload changes, or the metrics change? • Pat: then my model P will change • Sys: I can characterize the statistics of the workload and maybe other things… • Pat: Great! I can incorporate these characterizations in my models and decision making procedures
Summary/discussion • Presented statistical pattern recognition as a worthwhile approach for decision making in the context of current/future infrastructure • Presented a specific example and provided perspective on the issues • EVALUATE YOUR MODELS!!! • Benefits for systems • Deal with characterization issues, dynamics, opacity • Benefits for SPR • New application domain force new developments (results about calibration) • Other applications: • SLO characterization and diagnosis in 3-tier systems • ROC ???
Quote slide “Seguro esta el cielo que no lo caga zamuro.” Juan Bimba Venezuela