Pattern Recognition for Systems Management in Modern Infrastructure

On applying pattern recognition to systems management Moises Goldszmidt

High complexity of current/future systems • Expensive to come up with a closed form characterization of • Behavior • Interrelationship between components • Dynamic nature of • Workload/inputs • Infrastructure (software/hardware) • Opacity • Layers of abstraction (virtualization) • OEMS

Expensive closed form characterization Dynamics Opacity Cheap automatic characterization Adaptation Induction of mappings and estimation of state A proposal… Raw data Features P(rt|x) Decisions ObserveSystem InduceModels PerformInferences

Issues… • (Automatic) evaluation of models • Accuracy • Percentage of patterns captured • False positives vs false negatives • Decision making power • Uncertainty and confidence • Calibration • Amount of data • Decisions about model parameterization, tradeoffs between complexity and computation, overfitting and generalization • Uncertainty!

Hope… • Advances in datamining, machine learning, computational statistics… • Representation • Computation • Computational power • Search • Matrix inversion • Numerical techniques

Inducing models of black box storage arrays • Ira Cohen, Kim Keeton, Terence Kelly • Problem: • Given a trace of I/O response times of an XP512 • A specification for “fast” and “slow” • Forecast the response time (fast or slow) of any individual I/O request • Obstacles: Array is a black box • Applications: • Scheduling – serving compound web pages • Performance monitoring and anomaly detection

System under study

Methodology • Collect data (training set) • Induce probabilistic model • Priors • Mixture of regressions (MOR) • Naïve Bayes Classifier (NBC) • Provide decision procedure • Evaluation of the models on “unseen” data

Priors based model • Model = P(rt) • Decision procedure: given threshold for fast • If P(rt < fast) >50% then announce fastotherwise slow • Note: This forecast is constant and is independent of other characteristics of the input • Complexity of algorithms and computation • Trivial

cache simulators other features Several linearrelationshipsdepending oncache MOR Model cache responsetime • Model P(rt|cs,of) = Sc P(rt|of,c)*P(c|cs) • Decision procedure = given threshold for fast - If P(rt > t| cs,of) > 50% then announce fast otherwise announce slow

Evaluating models • Classification power: Did the model + decision procedure captured the patterns accurately? • Accuracy = percentage of correct predictions • Appropriate for anomaly detection • As decision makers: What is the confidence/risk on each decision? • Utility based: Pay according to confidence on each decision • Brier score = Sx (slowx – P(slow|x))2 • Appropriate for scheduling decisions • How much data: • When can we trust the model?

Accuracy results

Accuracy per RAID group

Classifiers as decision makers • Brier Sx (slowx – Pm(slow|x))2 = calibration + refinement • Calibration: if Pm(slow|x) = 10%, then E[ P(slow| Pm(slow|x))] = 10% • Refinement: How close is the forecast to being certain

On being calibrated • We can use the P as a measure of confidence • Refinement establishes a bound on the Bayes error • Accuracy may improve: • Threshold of 50% is optimal for real P • Calibration brings estimates of the model closer to real P • Calibration procedure: (DeGroot) • Map estimated P to trained set P Work with Ira Cohen

NBC before calibration

After calibration Number Training: 686091, Number Test: 343046. RG3, Days 27-30.

Learning curves: accuracy

Learning curves: calibration

Learning curves: refinement

A dialogue…. • Sys: Awesome, lets forecast whether a 3-tier system will meet its SLO – What should I measure? • Pat: Measure everything!! We will then establish a search over the measurements according to one of the different scores • Sys: Can you tell me whether the system will meet the SLO? • Pat: I can tell you the probability that the system will meet the SLO. Uncertainty is a fact of your world. I can provide decision procedures to deal with it • Sys: What happens if the workload changes, or the metrics change? • Pat: then my model P will change • Sys: I can characterize the statistics of the workload and maybe other things… • Pat: Great! I can incorporate these characterizations in my models and decision making procedures

Summary/discussion • Presented statistical pattern recognition as a worthwhile approach for decision making in the context of current/future infrastructure • Presented a specific example and provided perspective on the issues • EVALUATE YOUR MODELS!!! • Benefits for systems • Deal with characterization issues, dynamics, opacity • Benefits for SPR • New application domain  force new developments (results about calibration) • Other applications: • SLO characterization and diagnosis in 3-tier systems • ROC ???

Quote slide “Seguro esta el cielo que no lo caga zamuro.” Juan Bimba Venezuela

HP logo

Pattern Recognition for Systems Management in Modern Infrastructure

Pattern Recognition for Systems Management in Modern Infrastructure

Presentation Transcript

Pattern Recognition

Pattern recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Automated Eye-Pattern Recognition Systems

Introduction to Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition Systems

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Introduction to Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern recognition

Pattern Recognition

Pattern Recognition