The State-Space Approach to Self-Management of Enterprise Systems

The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute of Technology Hewlett-Packard labs*

Outline • Motivation: Enterprise Complexity • Issues • Solution Overview • Policy-Driven Self-Management • Dynamic SLA Decomposition • Results • Future Work

Enterprise Complexity: Some Facts • From a survey conducted by Forrester Research • Enterprises now devote 80% of their overall IT budget to maintenance and ongoing operations • More than half of the 347 participating companies used at least 3 database vendors • A major banking-industry client had 18 different travel and expense systems in the organization • “VP of IT Governance” - says tons about the state of enterprise IT infrastructure

The Complexity Wall “If we don’t get a handle on complexity, it will stop the expansion” - Paul Horn, Senior Vice President, IBM Research “Our enterprise customers are working with enormous complexity” - Dick Lampman, Former Director, HP Labs

The Complexity Wall @ • Worldspan, one of our industry collaborators, provides services to the travel industry • One of their airline ticket pricing/availability services is hosted on a farm of 1400 servers • In 2006 alone, they processed around 9.6 billion messages • Highly varying request rates and request type mix • Several behaviors of their system are not well understood • Effects of Ticket Geography • Effects of Cache Refresh Time • Effects of Time of Day …

To Handle The Complexity… • One must enable self-management of complex enterprise infrastructures driven by high-level goals

Enterprise Self-Management: The Hurdles • Enterprise systems are too big • The problem of Scale • It is tough to relate high-level goals to low-level actions • The problem of Complex System Modeling • The operating environment is very dynamic • The problem of Dynamism • Administrators find it hard to trust black-box solutions • The problem of Trust & Tractability

Variables of Interest Vø V, e.g. Response-Time, QoI • Controllable Variables Vα V, e.g. Allocated-Servers, Memory Solution Overview: System State-Space Enterprise System • The aim is to establish a relation between Vø and Vα under current operating conditions Monitored System Variables Monitored Component Variables System State SpaceV = (v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,vn)

Simple Automated Operation • SLO: “Response Time < 10msec” • Event: SLO Violation • Condition: Bandwidth=90Mbps, Request Rate=30 • Action: set Allocated Servers to 3 : VαVø given V – (Vα U Vø) Vα Vø 30 3 1 90 12 8 12 9 Request Rate Response Time Allocated Servers Bandwidth

v1 v2 . . . . . . . . . . . . . vn Solution Overview: The Function • Learn from observed system states • But there are problems • Different behavior in different sub-spaces • Large state space, |V| ≈ 102 to 103 CPU Bottleneck Machine Learning Network Bottleneck Observed System States

v1 v2 . . . . . . . . . . . . . vn Solution Overview: The Function • We decided to model the system using multiple µ-models = { } • We intelligently partition the set of observed system states • partitions exhibit homogenous behavior • partitions have a reduced number of relevant variables • Partitioning & µ-Modeling solve two problems! • The problem of Scale • The problem of Complex System Modeling Reduced Number of Relevant Variables in a µ-model

Solution Overview: µ-Models • We use Tree Augmented Naïve Bayes (TAN) Classifier to build µ-models • The model returns the following probability γ = Pr(Vα | Vdesired) • Find assignment of values to variables in Vα that maximizes the probabilityof moving the system to the desired state

Solution Approach: Dynamism • As the system keeps running more system states are generated, which could be incorporated into the µ-models • µ-models are easier to update as compared to monolithic system models • As a result of µ-model update • Policy Invalidation • Policy Adaptation • New Policies can Result • This addresses the problem of Dynamism

Solution Approach: Tractability & Trust • Each self-management action that assigns values to variables in Vα is associated with a probability γ = Pr(Vα | V – Vø) • An action is taken only when γ > γthreshold • This can be used to fine-tune self-management • TANs can be easily understood by administrators

Outline • Motivation: Enterprise Complexity • Issues • Solution Overview • Policy-Driven Self-Management • Dynamic SLA Decomposition • Results • Future Work

Policy-Driven Self-Management • SLO: “Response Time < 10msec” • Event: SLO Violation • Condition: Bandwidth=90Mbps, Request Rate=30 • Given the goal state (90,30,9), find the µ-model to use • Action: set Allocated Servers to 3 Current State Goal State (90,30,12) (90,30,9) 30 1 3 90 12 8 12 9 Request Rate Response Time Allocated Servers Bandwidth

System-Level SLA SLA1 SLA2 SLA3 SLA4 SLA5 conformance(SLA1, SLA2, …, SLAn) conformance(System SLA) Dynamic SLA Decomposition • Problem: To determine sub-SLAs for components that lead to SLA conformance • Sub-SLAs can be thought of as per-component range of values for controllable variables • If each component adheres to the sub-SLAs then the SLA is not violated • Our techniques can handle SLA decomposition

Experimental Results: SOA Simulator Without Self-Management With Self-Management

Database Perturbation Partition Change Experimental Results: RUBiS over VMs Without Self-Management With Self-Management

Conclusions & Future Work • Our techniques are applicable for a variety of enterprise systems • In our experiments the techniques have proven to be very scalable and accurate • Monitoring overheads can be reduced by taking inputs about relevant variables from the state-space partitions • Design & Implement techniques that can proactively avoid SLA violations

Thank You! References [1] V. Kumar, K. Schwan, S. Iyer, Y. Chen, A. Sahai. The state-space approach to SLA-based management. In submission to NOMS 2008. [2] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. iManage: Policy-Driven Self-Management for Enterprise-Scale Systsem. Middleware 2007. [3] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. Enabling Policy-Driven Self-Management for Enterprise Systems. PBAC 2007 in conjunction with ICAC-2007 [4] V. Kumar, et al. Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW. ICAC 2006

The State-Space Approach to Self-Management of Enterprise Systems

The State-Space Approach to Self-Management of Enterprise Systems

Presentation Transcript

A Portfolio Approach to Enterprise Risk Management

Zurich s approach to Enterprise Risk Management

State-space approach to control system analysis

State Space Approach to Signal Extraction Problems in Seismology

Enterprise Approach to Environmental Management Tools

New approach to management of State investments in enterprises

Information Management Enterprise Systems Systems Development

Systems Approach to Nitrogen Management

Analysis of Control Systems in State Space

State of the Space

Dynamical Systems Approach to Space Environment Research

Holistic Approach to Management of Ocean Space

Approach to the Management of Hypertriglyceridemia

The State of Wisconsin’s “Extended Enterprise” Approach

Transforming the Enterprise Using a Systems Approach

State-of-the-Art Research in Enterprise Risk Management

Infinite-dimensional linear port Hamiltonian systems – A state space approach

State of the Space

Systems Project Management Approach

State of the Space

Analysis of Control Systems in State Space