340 likes | 513 Views
Achieving self-healing in service delivery software systems by means of case-based reasoning. Stefania Montani Cosimo Anglano Presented by Tony Schneider Pr. Introduction. Background CBR Implementation Experiment / Cavy Results.
E N D
Achieving self-healing in service delivery software systems by means of case-based reasoning Stefania Montani Cosimo Anglano Presented by Tony Schneider Pr
Introduction Background CBR Implementation Experiment / Cavy Results
Autonomic Systems OverviewBackground | CBR Implementation | Experiment / Cavy | Results Goal is to self-manage system System needs to exhibit Self-Configuration Self-Optimization Self-Protection Self-Healing
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results “Service Delivery Systems” (SDS) Aimed at delivering 24/7 services These services prone to breakage Service failures Software, Hardware, Network Can’t handle manually Need to repair the system autonomously
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results Internalization The Self-Healing Engine is integrated with the software Not extendable Depends on specific applications Externalization Great for retrofitting current systems Allows a general method for SDS self-healing
Self-HealingBackground | CBR Implementation | Experiment / Cavy | Results Problems with current approach MAPE model assumes prior knowledge of the system Knowledge base is problematic Large, time consuming , & laborious Need to keep up-to-date Build the knowledge base automatically How?
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results Case-Based Reasoning (CBR) Uses previous experience for problem solving Retrieves similar cases to current problem Reuses past successful solutions Revises retrieved solution if necessary Retains current case
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results Case-base represents “knowledge” in the MAPE model Each case represents a previous problem and its solution Implicit versus Explicit knowledge Explicit: Rules & models Implicit: Unstructured & based on experience Implicit tends to be easier and more conducive to limited interaction
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results Cases are stored by identifying application features The problem Applied solution The outcome of the solution Prevents bottleneck present in other learning methods E.g., online reinforcement learning
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results CBR relies on large amounts of past cases Pros: Methods approve with time and experience Large systems are hosts to recurrent problems Cons Need to store the data Need to populate the knowledge base
Case-based ReasoningBackground | CBR Implementation | Experiment / Cavy | Results To reiterate: CBR is a methodology designed to assist in the repair of failed systems Questions so far?
System OverviewBackground | CBR Implementation | Experiment / Cavy | Results SDS is treated as a black box Self-healing CBR is entirely external to the SDS Controls the health of the SDS Components of CBR reflected in MAPE Analysis <-> Retrieval Planning <-> Revise Knowledge <-> Case base
System Overview: MAPE RevisedBackground | CBR Implementation | Experiment / Cavy | Results Old Model Revised for CBR
System Overview: MAPE RevisedBackground | CBR Implementation | Experiment / Cavy | Results • Four Additions • Monitoring • Case Preparation • Service Restoration • Repair Module
System Overview: MAPE RevisedBackground | CBR Implementation | Experiment / Cavy | Results • Application Agnostic Portion • Doesn’t rely on specific environment variables • Application Specific Portion • Relies on the data from the application • Both • Interface between the two layers • The managed element is completely external to the healing system
System OverviewBackground | CBR Implementation | Experiment / Cavy | Results • Assumptions • Bad solutions have no effect on the SDS state. Likewise, good solutions don’t produce faults. • Deadlines for producing case solutions aren’t fixed • Every stored case has a unique solution • No transient faults (occur only once) • No intermittent faults (appear, disappear, then reappear again)
CBR Cycle: Retrieve - Reuse/Revise - RetainBackground | CBR Implementation | Experiment / Cavy | Results • Every stored case is representative of some past failure • Need to find the case that approximates current failure • Find the average distance between features • df(x, y) • 1 if x or y are missing • overlap(x, y) if f is a symbolic feature • if f is a linear feature
CBR Cycle: Retrieve - Reuse/Revise - RetainBackground | CBR Implementation | Experiment / Cavy | Results • Apply retrieved case solutions in the order of the bset average • Repeat for all found cases until the problem is solved • Also covers cases with multiple solutions (just use best choice) • What if no solution works? • Ask a human
CBR Cycle: Retrieve - Reuse/Revise - RetainBackground | CBR Implementation | Experiment / Cavy | Results • Just saves the case to the knowledge base • The problem • The solution • The outcome
Odds and EndsBackground | CBR Implementation | Experiment / Cavy | Results • System initialization • Boot strap phase • Prototyping • Makes a general case out of several similar cases in case base • Solves storage space problem • Takes the implicit knowledge and creates explicit knowledge • Used after base case has grown
CBR questions?Background | CBR Implementation | Experiment / Cavy | Results That wraps up the CBR portion. Any Questions?
Experimental SetupBackground | CBR Implementation | Experiment / Cavy | Results • Implemented CBR-based system using Java • MySQL for the base case storage • Used with an SDS testbed “Cavy” • Cavy • Configures, deploys, and operates SDS testbeds • Framework that surrounds the healing engine • Injects faults into test bed components
Cavy ComponentsBackground | CBR Implementation | Experiment / Cavy | Results • Fault managers • Diagnoser • Service Monitor • Integrator • Repairer • Injector
Cavy ComponentsBackground | CBR Implementation | Experiment / Cavy | Results • Basically... • The injector breaks the system • The service monitor sees the fault • The diagnoser finds a similar FS pair • Interrogator receives the solution • Repairer tries each solution until one works
Cavy ComponentsBackground | CBR Implementation | Experiment / Cavy | Results • Cavy implements pieces of the self-healing architecture • Interrogator: Application agnostic pieces • Fault repairer: Application specific pieces • Service monitor: Monitor • Fault managers: Repair
The ExperimentBackground | CBR Implementation | Experiment / Cavy | Results • Rubis • Mimics eBay • Two tiers • Customers interact with web server on the first • Database stored on the second • Several services are tested • Register, Browse, Sell, Home
The ExperimentBackground | CBR Implementation | Experiment / Cavy | Results • Potential Rubis Failures (each can apply to either tier) • Network Problems • Configuration problems • System restart • 10 failure descriptors • Boolean values • Represent failed pieces of the system
Initial Base Case (constructed by a human)Background | CBR Implementation | Experiment / Cavy| Results Automatically generated case
Initial Base Case (constructed by a human)Background | CBR Implementation | Experiment / Cavy| Results Distances between current failure and base case
Second CaseBackground | CBR Implementation | Experiment / Cavy| Results
ResultsBackground | CBR Implementation | Experiment / Cavy| Results • Continued like this for 3 days • Of 1016 cases, less than 11 needed human intervention • Prototypes functioned correctly • Reduced size of database • Handled new faults with out human intervention • Narrowed down the possible failures to 9 prototype cases • Showed “complex” problems were just simultaneous simple problems
Future Work • Use in real-world applications • Working around the given assumptions • Use of prototyping/generalization • Combine CBR with other knowledge sources • Combine CBR with some other methodology
Conclusion • CBR a good solution to self-healing • Repair procedure triggered by service failures • No structured knowledge needed • Worked well even with novel faults