1 / 24

Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi. Introduction . Autonomic Problem Approach Results Discussion. The Autonomic Problem. To allow the application to recover automatically from transient and intermittent software failure. The Approach.

pahana
Download Presentation

Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autonomous Recovery in Componentized Internet ApplicationCandea et. alVikram Negi

  2. Introduction • Autonomic Problem • Approach • Results • Discussion

  3. The Autonomic Problem • To allow the application to recover automatically from transient and intermittent software failure.

  4. The Approach • Introduce the idea : • Microanalysis (fault detection) • Microrebooting (rapid recovery) • External Management (recovery action) • Integrate and Test with JBOSS

  5. Design Overview • Autonomous Process • Monitoring • Java probes • Fault detection • Generate Anomaly report • Recovery • Takes action • Total time to recovery.

  6. J2EE Review • J2EE enterprise apps = collection of reusable Java modules • JSPs / servlets invoke EJBs, which invoke other EJBs, ... • EJB = Java component that complies to a certain interface and provides a service • Deployment descriptor (per-bean XML file) conveys run-time characteristics and dependencies; used in deploying the application

  7. JBoss Design • Open-source J2EE app server • Written entirely in Java • Microkernel with components held together by JMX (Mgmt Support)

  8. JAGR = ROC-ified JBoss with Application-Generic Recovery • 3 Tier Architecture • Key Components • Macro analysis Engine • Microrebooting Hook • Recovery Manager

  9. Pinpoint : Detection and Localization • Store Observation • IP address of machine, timestamp • Globally unique request ID. • # of calls/returns to EJB’s • Association between sender and receiver. • Collect SQL Queries, update, read

  10. Pinpoint : Analysis • Analysis Engine • Centralized Engine • Plugin based architecture • Modeling Components • Assume both present component behavior and historical (normal) behavior have same probability distribution. • Ki square test to determine different probability distribution.

  11. Recovery : micro-reboot is not expensive • State Segregation • Store impt. state outside the application in database. • Persistent State • CMP (container managed persistence, J2EE) is a requirement for prototype. • Session State • Store in modified SSM(external session state store) • Containment and Reintegration • Microreboot transitive closure of all inter-EJB references • XML deployment descriptors to determine grouping for closure • Complete or micro reboot

  12. Recovery • Enabling Micro reboot • Method in JBOSS EJB Container • Preserve Class Loader

  13. Manage Recovery • Recovery Policy • Read failure report consider components > 1.0 • Micro-reboot(top n) or all >1.0 • Allow delay (~30sec) • If error is present still try few time or reboot completely • Finally report it to sys admin

  14. Evaluation Test Framework • Application • Petstore 1.1 (12 comp, 233 java file, 11K Loc) • Petstore 1.3.1(47 comp, 310 java file 10K Loc) • RUBiS (21 comp, 500 java file , 25K Loc) • Workload • Implement Simulators with Transition table. • 350 client (max utilization principle) • Faultload • Based on industry experience • No low level hardware or OS faults.

  15. Evaluation Detection • Result similar to other detector • No discussion on absolute numbers? • Forced Java Runtime/Declared Exceptions, call emission and src code bug • 1# How well the fault was detected, 2#how well major outage was detected ?

  16. Evaluation : Localization Localization % for a algorithm per fault type CIA > 85% No absolute data again ?

  17. Evaluation : Recovery • Introduce faults in SSM-RUBiS. • Restart SSM-RUBiS or micro reboot component. • Observation from 10 trials per 350 concurrent client.

  18. Full v/s Micro reboot • Injected a null reference fault in SB CommitBid, then a corrupt User-Item, SB BrowseCategories and SB CommitUserFeedback. • Microreboot maintains steady response. • 425 vs 3916 failed request • 61527 vs 56028 success request • What error condition did other trials had?

  19. Total Recovery Time • Corrupt SB_ViewItem set it to NULL. • 19.4 sec TRT • 18.5 sec in analysis • Pinpoint is bottleneck in micro reboot.

  20. Pinpoint is app generic ? • Upgrade to Petstore v.1.3.2 • Works for the confidence interval How different was the updated version??

  21. Perfomance Overload • Results for 30min fault free run w/ 350 clients • In memory v/s Out memory (SSM) • Marshalling costs

  22. Assumption • Well defined interface for components (.Net,J2ee) • Deterministic call path b/w component • No critical service request • Training data for statistical model • Guidelines (Crash Only Software)

  23. Discussion • Overall one of the Good Papers maybe bit verbose in introduction ! • Integrating framework for earlier work by Candea. • Limitation of the present statistical model. • Shared EJB state • Modify JIT, disable microreboots(ref, static var) • Application – Global data not scrubbed. • Cost Benefit : micro reboot v/s total reboot

  24. Supplementary • Application server = operating system for Internet applications (instantiates app components in containers, provides runtime system services, integrates with web server to make app webaccessible) • http://people.epfl.ch/george.candea

More Related