370 likes | 384 Views
Explore the challenges of complexity in heterogeneous computing environments and their impact on system management. Discover how autonomic computing can provide self-managing systems that adapt, tune, and recover from failures, ultimately simplifying the administration of pervasive computing.
E N D
(Slides are taken from the presentations by Alan Ganek, Alfred Spector, Jeff Kephart of IBM)
Trillions of heterogeneous computing devices connected to the Internet Dream of Pervasive Computing … or Nightmare!
Core of the Problem • Complexity in systems themselves and in the operating environment • As systems become more interconnected and diverse, architects are less able to anticipate and design interactions among components push to runtime, late binding e.g., hot-plug, JVM, JIT compilation, service discovery, mobile agents, … • Complexity management human intervention and IT costs
Need Complexity Management • But complexity is beyond that human can handle Human out of the control loop autonomic • Even though we are moving along this direction, is there any systematic way of addressing this issue? • Autonomic Computing
Industry Trends • Administration of systems is increasingly difficult • 100s of configuration, tuning parameters for DB2 • Heterogeneous systems are increasingly connected • Integration becoming ever more difficult • Architects can't plan interactions among components • Increasingly dynamic; frequently with unanticipated components • More burden must be assumed at run time • But human administrators can't assume the burden • 6:1 cost ratio between storage admin and storage • 40% outages due to operator error • Need self-managing computing systems • Behavior specified by sys admins via high-level policies • System and its components figure out how to carry out policies
Autonomic Computing Vision • “Intelligent” open systems that… • Manage complexity • “Know” themselves • Continuously tune themselves • Adapt to unpredictable conditions • Prevent and recover from failures • Provide a safe environment • Self-management: • free administrators from details of operations • provide peak performance 24/7 • Concentrate on high-level decisions and policies
Self-managing Systems That … Business Resiliency Discover, diagnose, and act to prevent disruptions Increase Responsiveness Adapt to dynamically changing environments Aware/Proactive Operational Efficiency Tune resources and balance workloads to maximize use of IT resources Secure Information and Resources Anticipate, detect, identify, and protect against attacks
Self Optimizing:Enterprise Workload Management Heterogeneous, distributed components working together • Self-tuning, end-to-end performance management • Dynamic allocation of network resources • Workload balancing & routing • Cross platform reporting • Policy-based for various classes of users & applications
Self-Protecting Example: IBM Tivoli Risk Manager Rapid / automated analysis of complex situations
Evolving to Autonomic Computing Basic Level 1 Managed Level 2 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Multiple sources of system generated data Characteristics Requires extensive, highly skilled IT staff Skills Basic Requirements Met Benefits Autonomic Manual
Evolving to Autonomic Computing Basic Level 1 Managed Level 2 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Consolidation of data and actions through management tools Multiple sources of system generated data Characteristics Requires extensive, highly skilled IT staff IT staff analyzes and takes actions Skills Greater system awareness Improved productivity Basic Requirements Met Benefits Autonomic Manual
Evolving to Autonomic Computing Basic Level 1 Managed Level 2 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Consolidation of data and actions through management tools System monitors, correlates and recommends actions Multiple sources of system generated data Characteristics Requires extensive, highly skilled IT staff IT staff approves and initiates actions IT staff analyzes and takes actions Skills Reduced dependency on deep skills Faster/better decision making Greater system awareness Improved productivity Basic Requirements Met Benefits Autonomic Manual
Evolving to Autonomic Computing Basic Level 1 Managed Level 2 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Consolidation of data and actions through management tools System monitors, correlates and recommends actions Multiple sources of system generated data System monitors, correlates and takes action Characteristics Requires extensive, highly skilled IT staff IT staff approves and initiates actions IT staff analyzes and takes actions IT staff manages performance against SLAs Skills Reduced dependency on deep skills Faster/better decision making Greater system awareness Improved productivity Balanced human/system interaction IT agility and resiliency Basic Requirements Met Benefits Autonomic Manual
Evolving to Autonomic Computing Basic Level 1 Managed Level 2 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Consolidation of data and actions through management tools Integrated components dynamically managed by business rules/policies System monitors, correlates and recommends actions Multiple sources of system generated data System monitors, correlates and takes action Characteristics Requires extensive, highly skilled IT staff IT staff approves and initiates actions IT staff focuses on enabling business needs IT staff analyzes and takes actions IT staff manages performance against SLAs Skills Reduced dependency on deep skills Faster/better decision making Business policy drives IT management Business agility and resiliency Balanced human/system interaction IT agility and resiliency Greater system awareness Improved productivity Basic Requirements Met Benefits Autonomic Manual
IBM’s Architecture Model • Intelligent control loop: • Implementing self-managing attributes involves an intelligent control loop
Control Loops Delivered in 2 Ways Combinations of Management Tools Resource Provider
Autonomic Element - Structure • Fundamental atom of the architecture • Managed element(s) • Database, storage • Autonomic manager • Responsible for: • Providing its service • Managing ownbehavior inaccordance withpolicies • Interacting with other autonomic elements Sensors Effectors Analyze Plan Autonomic Manager Monitor Execute Knowledge Sensors Effectors Managed Element An Autonomic Element
Autonomic Manager Substructure Alerts, events & problem analysis request interface SLA/Policy interface, interprets & translates into "control logic" Sensors Effectors Plan Analyze Policy Interpreter Analysis Engines Policy Validations Policy Transforms Policy Resolution Plan Generators Monitor Execute Rules Engines Workflow Engine Knowledge Filters Service Dispatcher Simple Correlators Topology Calendar Scheduler Engine Metric Managers Recent Activity Log Policy Distribution Engine
Autonomic Elements - Interaction • Relationships • Dynamic, ephemeral • Formed by agreement • May be negotiated • Full spectrum • Peer-to-peer • Hierarchical • Subject to policies
Multiple Contexts for Autonomic Behavior Business Solutions (Business Policies, Processes, Contracts) Customer Relationship Management Enterprise Resource Planning Server Farm Enterprise Network Storage Pool Groups of Elements (Inter-elementself-management) System Elements (Intra-element self-management) Database Network Devices Servers Storage Middleware Applications
Autonomic Computing Requires Core Technologies Solution Management End-to-end Problem Determination Dynamic Provisioning Automated Root Cause Analysis Auto-Update Heterogeneous Workload Management Identity/Security Management Policy-based Management Auto-Detection Enabled capabilities Core technologies Data Collection (Logging/Tracing) Install/Dependency Management Administrative Console Infrastructure Provisioning Policy Infrastructure
Value: One consistent interface across product portfolio Common runtime infrastructure and development tools basedon industry standards, component reuse Provides a presentation framework for other autonomic core technologies Integrated Solutions Console for Common System Administration Customer pain point: Complexity of operations Standards-based: J2EE, JSR168
Value: Introduces standard interfaces and formats for logging and tracing Central point of interaction with multiple data sources Correlated views of data Reduced time spent in problem analysis Log and Trace Tool for Problem Determination Customer pain point: Difficulty in analyzing problems in multi-component systems Standards-based: JSR47, Apache
Value: One consistent software installation technology across all products Consistent and up-to-date configuration and dependency data, key to building self-configuring autonomic systems Reduced deployment time with less errors Reduced software maintenance time, improved analysis of failed system components Component-based install for IBM and non-IBM products Install/Config Package for new Solutions Customer pain point: Difficulty of deployment in complex systems Standards-based: OGSA, Web Services Partnering with InstallShield
Value: Uniform cross-product policy definition and management infrastructure, needed for delivering system-wide self-management capabilities Simplifies management of multiple products; reduced TCO Easier to dynamically change configuration in on-demand environment Policy Tools for Policy-based Management Customer pain point: Complexity of product and systems management Definition M O N I T O R Local Repository Validation Analysis Push or pull Facts Distribution Push or pull Enforcement Point Enforcement Point Activate … Adaptation … Resource Resource Resource Implement
Value: Components to simplify the incorporation of autonomic functions into applications Building blocks for self-management Monitoring, analysis, planning and execution components Including autonomic computing technologies, grid tools, and services Pluggable Defines interfaces and provides implementations for each major toolkit component Technologies for Implementing Autonomic Managers Customer pain point: How to implement end-to-end autonomic solutions Standards-based: OGSA, W3C
Summary of Autonomic Computing Architecture • Based on a distributed, service-oriented architectural approach • Every component provides or consumes services • Policy-based management • Autonomic elements • Make every component resilient, robust, self-managing • Behavior is specified and driven by policies • Relationships between autonomic elements • Based on agreements established and maintained by autonomic elements • Governed by policies • Give rise to resiliency, robustness, self-management of system
The Metaphor • Without requiring our conscious involvement • when we run, it increases • our heart and breathing rate
Integrating Biology and Information Technology: The Autonomic Computing Metaphor • Current programming paradigms, methods, management tools are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems • Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees • self configuring, self adapting, self optimizing, self healing, self protecting, highly decentralized, heterogeneous architectures that work • Goal of autonomic computing is to build a self-managing system that addresses these challenges using high level guidance • Unlike AI, duplication of human thought is not the ultimate goal