1 / 136

Liberty Mutual High Availability Assessment & Optimization Project

This report outlines the findings and recommendations of a comprehensive assessment to improve Liberty Mutual's availability practices. It includes top 10 high-level considerations, implementation plans, and phases of the project.

jacobgeorge
Download Presentation

Liberty Mutual High Availability Assessment & Optimization Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Liberty Mutual High Availability Assessment & Optimization Project Final Report 3 December 2004

  2. Executive Overview Top 10 High Level Considerations • Introduction • Project Status • Top 10 Recommendations • Sample Implementation Considerations • Next Steps Appendices Confidential

  3. High Level Implementation Plans to improve Liberty Mutual Availability were the result of comparing Leading Practices, assessment of patterns and associated risk to availability Phase III Phase I Phase II Conducted Interviews Gathered Data 120 Recommendations 10 Initiatives 179 Recommendations evaluated against Leading Practice Final Deliverable Reviewed Actual Outages Compared to Infrastructure Validated Phase I Data 59 Recommendations Groups Implementation Plans Recommendations Identify the most common outage causes 7 Recommendation Groups High Level Implementation Considerations High Risk Recommendations from the 7 Groups 10 Recommendations Confidential

  4. A few of the recommendations from the Phase I Report have either began implementation or have been scheduled to start Phase I Initiative Implementation Status Began Process Review Began Incident & Problem Implementation Workshops being held Wide participation (PM, CM, & RAM) Change Mgmt Scheduled Start – March 05 Activated sub-tier element (Server Trending) Began effort – Portfolio mapping approved Partially Implemented Auto Event Logging Capability now Exist 120 Recommendations were generated during Phase I of the project Confidential

  5. In the Phase II assessment we discovered 71% of Liberty Mutual outage hours were avoidable in the 7 outages reviewed Phase II • There were 31 unique Personal Markets Outages reported in Mantis in August and September • Analysis of the 7 longest duration outage records indicates process related problems were responsible for approximately 76% of the outage time • Lack of a Production mirrored Quality Assurance environment was responsible for approximately 23% of the outage time 59 Recommendations were generated during Phase II of the project Confidential

  6. The engagement team made over a hundred seventy-five (175) recommendations to reduce the number of outages and improve the recovery time and reduce the impact of outages that do occur Ninety-Seven (97) recommendations have been assigned to Liberty Mutual managers No implementation target date Status of Phase I & II Recommendations Twenty-five (25) recommendations have begun implementation or have a target date assigned One (1) recommendation is awaiting management approval Personal Markets has proposed an Application Portfolio Management (APM) project Application Decomposition will provide the initial data for APM Fifty-five (55) recommendations are unassigned One (1) recommendation is an extension of an existing Liberty Mutual project SYSPLEX Distributor and Net390 are extensions to an existing Parallel SYSPLEX project After the Phase II assessment many of the Phase I top 10 recommendations were validated and surfaced as repeatable recommendations Confidential

  7. The 175+ recommendations were compared to Leading Practices to determine the level of risk to availability. This resulted in 7 groups of recommendations. Phase III Recommendation Groups High High High High High High High Confidential

  8. The seven high risk groups led to the top ten recommendations to address the most common Liberty Mutual outages Top Ten Recommendations All 10 Recommendations have High Level Implementation Plans in the following section Confidential

  9. The high level implementation considerations contained in the following section (samples shown below) provide insight in the initial development of project implementation plans for the 10 recommendations High Level Implementation Considerations Standard & Operational Practices • Recommendations • It is recommended that LM identify senior management champions for key infrastructure management processes. These champions will sponsor the development and implementation of formal processes and procedures supporting High Availability and enable IT to deliver user valued services more effectively and efficiently. • Have well defined roles and responsibilities with clearly understood ownership parameters • Benefit • Minimize overlapping of responsibilities, establish consistent process of handling Incidents, Problems and Change Management, etc. (mitigating impact to High Availability) Application Decomposition Recommendations • Develop customized Architecture Stack of all Liberty Mutual Infrastructure • Perform Application Decomposition on the entire Application Portfolio • Map the applications horizontally to show interrelationships between applications • Map the applications vertically to show interrelationships between applications and infrastructure • Map decomposed applications to Business Functions Benefits • Fully documented infrastructure including all hardware and software • Fully documented Application Portfolio • Fully documented Business Function to Application and Infrastructure • Provide initial input for Application Portfolio Management Confidential

  10. Review and implementation of the Availability recommendations should be a high priority of the Liberty Mutual senior management team Next Steps • Review and consider implementation of the remaining Top 10 recommendations • Review the remaining 55 unassigned recommendations and assign ownership • Prioritize unassigned recommendations • Establish action plans • Review other Markets deployment for single points of failure • Conduct Management Review of all recommendations to determine “value add” and progress • Follow Formal Project Planning process for all actionable recommendations • Track progress of active projects to established action plans • Communicate recommendations and actions • Consider periodic external Availability assessments using this document as a baseline Confidential

  11. Executive Overview Top 10High Level Implementation Considerations • Standard and Operational Practices • Incident and Problem Management • Change Management • Document Application Flows • Application Decomposition • QA Mirror of Production • Centralized Event Logging • Intelligent Application Switching • Highly Available MQ • Parallel SYSPLEX Appendices Confidential

  12. Standard & Operational Practices 1 • Findings • Core systems management disciplines provide weak controls to maintain the integrity of the production environment • Found little evidence of formally documented, communicated and implemented infrastructure management processes and supporting working level procedures across all of IT and Personal Markets • Recommendations • It is recommended that LM identify senior management champions for key infrastructure management processes. These champions will sponsor the development and implementation of formal processes and procedures supporting High Availability and enable IT to deliver user valued services more effectively and efficiently. • Have well defined roles and responsibilities with clearly understood ownership parameters • Benefit • Minimize overlapping of responsibilities, establish consistent process of handling Incidents, Problems and Change Management, etc. (mitigating impact to High Availability) Confidential

  13. Good business and IT integration requires the coordinated balance of organization, technology (infrastructure and tools) and processes. Well written Standard Operations & Practices (SO&P) documents, and their purpose, will support effort to maintain this balance System Management Integration Description Business Context Like the three-legged stool, one missing piece adds additional stress and resources to the other two in order to obtain favorable service delivery Practices / Processes should be Defined, Documented and Repeatable Organization Process / Practice Performance Availability Information Technology Many organizations will look only at one leg, some at two and very few at all three Service Level Agreements Liberty Mutual should take a holistic approach to organizational solutions Confidential

  14. The overall concept for Standard Operations & Practices is to eliminate confusion, develop definitions or specifications for resources required and to provide instructions for implementing and executing the practice or process Description (continued) High level practices e.g. Manage Change Resource definitions Resource requirements Job descrip-tions Organization Sub-processes e.g. Plan Change Deployment Skill require-ments Roles & responsibil-ities Training Inputs, Activities & Outputs SLAs Measure-ments Targets & incentives Functional require-ments Tool require-ments Procedures e.g. Change Request Form Confidential Tools & technology

  15. Process focuses on how work flows through an organization regardless of what functions it crosses or what technology supports it; it takes the user’s end to end viewpoint. Practice/ process elements must be clearly defined, to assure value added output. Practice or Process Considerations • A) Executive-level commitment is present, active and communicated to all employees • Because asset management in particular touches so many business processes and introduces change into the environment, executive-level sponsorship and focus is essential to ensure successful implementation and on-going improvements • Without executive-level commitment, efforts will not meet expectations and individual projects will fail • B) Development of common practices / processes follows a structured methodology and involves key stakeholders • A structured methodology ensures that pieces do not fall through the cracks; key stakeholders are needed to provide input on required improvements and to provide leadership in accepting the process • Without involvement of key stakeholders, the process is subject to skepticism and will not be accepted as the way to operate • C) Common practices / processes are used across the enterprise • Common practices / processes reduce redundant work efforts and improve productivity • D) Practices / processes are defined and documented; once documented, employees are trained in their usage. • Practices / process must be documented so that everyone can execute them with consistency • Failure to train employees on process usage will result in inconsistencies and increased cycle time Confidential

  16. Process focuses on how work flows through an organization regardless of what functions it crosses or what technology supports it; it takes the user’s end to end viewpoint. Practice/ process elements must be clearly defined, to assure value added output. Practice / Process Considerations • E) Practice / process objectives are defined and understood • Each process needs to have clearly defined results that can be measured • Without well-defined and understood objectives, people have a tendency to execute the activities and tasks without accepting accountability and ownership in getting to end-of-job • F) Practices / processes are regularly analyzed and optimized through continuous feed-back • A closed-loop process allows for improvements to be made • Without constant analysis, there is a strong possibility that opportunities to streamline or eliminate steps will be missed • G) Process defines tool functional requirements, helping to eliminate “tool wars” • Process defined in conjunction with tool capabilities provides for reasonableness in execution and a system based on business requirements; process helps compare tools in an ‘apples to apples’ fashion • Without process driving requirements, the wrong tool can be selected based on particular biases Confidential

  17. A review of the parent organization should be completed prior to the development of Standard Operational Practices to assure practices are not contradictory to the “Mission” of the organization High Level Implementation Consideration Organizational • Select which organizations will be assessed • Determine if the organization has the following items defined / documented and communicated: • Mission Statement • Objectives • Identification of Customer • Guidelines for developing Standard Operating Practices • If organization does not have any one of the items • Select individual to develop missing item • Establish time frame for completion Standard Operating Practice • As a minimum, determine which processes are frequently used and define • Identify or assign “Owner” of process (may not necessarily be writer or author) • Develop a “Standard Practice” document (example of “Table of Content” of practice on following page) Implementation and communication of the right practices can improve resource utilization and provide positive contribution to business objectives Confidential

  18. A standard format will assure practices are developed in a consistent manner and contain minimal required information Example Standard Operating Practice - Table of Contents • Introduction(Overview of Practice) • Key Assumptions(Items that will influence the execution of the practice, i.e. hours of operation) • Mission(Purpose of practice) • Requirements(Business or Organizational requirements that the practice will address) • Objective(Goals of the practice – satisfaction of requirements) • Scope(Items that governs the boundaries of the practice) • Authorization(Individuals who developed the practice and who must approve the release of the practice) • Target Audience(Optional – indicating who the intended audience is for the written practice) • Practice Content(Description of the practice) Confidential

  19. Problem Management process was not well defined, documented nor consistently applied. The Incident Management process was essentially non-existent. 2 • Findings • Root Cause Analysis documentation seldom actually documents root causes of the outages / incidents or contributing causes • Multiple unlinked problem management repositories impedes Liberty Mutual’s ability to manage problems • Ownership for end-to-end Problem Management process is not defined • All incidents are not documented nor are they routed to a single Helpdesk • Recommendation • Implement Problem and Incident Management processes that include: • Dedicated resource to the Incident management process for coordinating service restoration, receiving, logging and tracking all incidents for the Liberty Mutual environment • Dedicated resource to the Problem management process for coordinating problem resolution, receiving, logging and tracking all problems for the Liberty Mutual environment • Assure multiple organizations are participants in the development of process standards • Assure processes are defined, documented and repeatable • Benefit • Minimize overlapping of responsibilities, establish consistent process of handling Incidents, Problems and Change Management, etc. (mitigating impact to High Availability) • Focus on prevention, not just fixing • Improved service delivery resulting in improved Service Level Agreement parameters (Improved Customer Satisfaction) Confidential

  20. Process Implementation Workshop “waterfall” for Incident and Problem Management Mission, Vision & Strategy IT Guiding Principles Data Elements Process Requirements Process Design Principles: Incident – Problem - Change Rationale Implications Tool Selection / Implementation Process Policies Procedures Standards: Severity Codes Incident, Problem & Change Codes Closure Codes, etc. Workflow Design Roles Responsibilities Process Procedures / User Guides = Steps Completed = Current Activity Operational Service Levels Confidential

  21. Liberty Mutual has established an Incident & Problem Management process implementation project that will define and document the processes. Additionally, representatives from various organizations will participate is setting standards. Confidential

  22. Develop an enterprise management system to record, control and manage all changes to the infrastructure and application environment, including procedures and policies and undertake regular reviews with regard to all changes 3 • Findings • Change Management procedures are not fully encompassing (Enterprise) • Fully integrated Change Request system not available • Ownership for end-to-end Change Management process is not defined • Recommendation • Develop Enterprise-wide Change Management process (based on Leading Practices) • To include: • Creation of a change review board with representation from all IT service sections • Develop a mission statement to define the roll and scope of the Change Authority • Define Structured Roles and Responsibilities • Define a process to the initiation, approval, scheduling, execution and review of all changes to the environment • Selection and use of one tool to manage changes • Provide link to Configuration Management • Benefit • Increased prevention of changes impacting major systems • Consistent execution of process • Improved database capture • Knowledge transfer Confidential

  23. 12 11 1 10 2 5 9 3 8 4 7 5 6 Change Information Change Management is the ongoing process concerned with the introduction of managed changes into the environment with minimal or no disruption to the network environment and its users • Impact • Risk • Back-out Plan Change Requests Document • Input: • Technology • Business Scheduling Change Plans & Reports CCB - Approval Network Environment Verify “No Conflict” Confidential

  24. Liberty Mutual has established a Change Management process implementation project that will define and document the process. Additionally, representatives from various organizations will participate is setting standards. Confidential

  25. Document Application Flows 4 • Finding • Application to server mappings were not available to rapidly determine failing component • Recommendation • Document the logical inter-tier traffic flow of existing applications and map to the underlying infrastructure components • Benefit • Facilitate problem determination through increased understanding of the overall architecture and how the major components of the system interconnect from a logical and physical perspective Confidential

  26. Presentation GUI End Users System (HTML, Windows, Forms, etc) Presentation Layer Logic The Server Side Web Java, HTML, XML Client Interface Distributed Logic Proxy Tier Business Tier Business Object and Rules Data Manipulation Data Access Tier Stateless I/O Interface to Backend Data Tier Storage The complexity of highly available n-tier applications complicates problem management and can obscure component failure and hinder problem isolation Description • Infrastructure components are often duplicated within a tier • Exact path of traffic flow can be unpredictable • Failure of redundant components my go undetected • Being partially addressed by existing Liberty Mutual project Typical n-tier model Confidential

  27. Step by step inter-tier communication flows should be clearly documented to facilitate problem isolation Example Client 1 10 Switch 2 9 Web 8 3 Directory 4 Application 7 Data 5 6 Confidential

  28. Application Decomposition 5 Finding • There is little evidence of formally documented application to application dependencies and application to infrastructure requirements Recommendations • Develop customized Architecture Stack of all Liberty Mutual Infrastructure • Perform Application Decomposition on the entire Application Portfolio • Map the applications horizontally to show interrelationships between applications • Map the applications vertically to show interrelationships between applications and infrastructure • Map decomposed applications to Business Functions Benefits • Fully documented infrastructure including all hardware and software • Fully documented Application Portfolio • Fully documented Business Function to Application and Infrastructure • Provide initial input for Application Portfolio Management Confidential

  29. Document and cross reference the infrastructure and application environment to facilitate quick problem isolation High Level Implementation Consideration • Develop customized Architecture Stack of all Liberty Mutual Infrastructure • Perform Application Decomposition on the entire application portfolio • Map the applications horizontally to show interrelationships between applications • Map the applications vertically to show interrelationships between applications and infrastructure • Map decomposed applications to Business Functions Confidential

  30. Business Applications CSW PCA Application Peripherals Database Host Environment Hardware Storage Environment Disk Infrastructure Services Monitoring Scheduling Third Party Data Incoming Feeds Interfacing Services Messaging Security Authentication Network Traffic Routing The Architecture Stack Concept Description Business Applications CSW PCA Application Peripherals Database DB2 Sybase DB2 UDB The stack is nothing more than a layered collection of “application and environmental groupings” that comprise stratum Data Sharing The architecture of the application, and the Disaster Recovery RTO requirements, determine the Qualities that must be present in each Attribute identified within the Application’s Stack Cross-section The Architecture Stack provides documentation to accurately determine the obtainable metrics for SLAs. In this simplified example, If the application has a DB2 database with Data Sharing, we can provide “X” response, with “Y” incident recovery time, and an RTO of “Z”. For a Business Application to function, it is dependent on a vertical cross-section of the Architecture Stack Confidential

  31. Liberty Mutual has attempted to kick off an Application Portfolio Management (APM) project from the Application and Infrastructure perspectives High Level Implementation Consideration • Enterprise IT Services has started an APM project • Application dependency on Infrastructure is mapped by NetFlow • Maps transaction flow from one server to another • Initial setup is labor intensive • This is the beginning of an Architecture Stack • All other Application dependency is waiting on Management project approval • This will be the Application Decomposition Confidential

  32. Application Decomposition is a process to determine dependencies and interdependencies of the application portfolio and produce an Application and Infrastructure Reference Architecture. We start with one time processes to document and validate the Infrastructure. Collect and Validate Template Data Develop Architecture Stack Develop Stack Restrictions • Develop Architecture Stack • This is the infrastructure foundation upon which applications are mapped • IBM defines this as a layered inventory of products, by architecture layer, with the respective attributes and qualities of each product eligible for application intersection • Develop Stack Restrictions • Standards upon which recovery solutions are built • Provides leverage and input that the DR team contributes to the Application & Infrastructure Reference Architecture Application Decomposition Future State Start Review Decomposition Validation Auto-discovery tools Data collection templates Interviews Workshops Working Sessions Reviews Implement Current / Future State Gap Analysis Plan Design Confidential

  33. In preparation for the Application Decomposition, preliminary data must be gathered, assessed and verified. This data in conjunction with the Architecture Stack will be used to create multiple choice templates for the interview processes. Develop Architecture Stack Develop Stack Restrictions Collect and Validate Template Data • Validate Guiding Principles • Collect and validate current and anticipated Guiding Principles that impact the Recoverability and Availability design modules • Collect Application Tier Definitions • Review and assess application Tier definitions, assignments, and recovery sequencing • Create Application Decomposition Template • Business descriptions • Application descriptions • Recovery options • Application dependency options • Infrastructure options Application Decomposition Future State Start Review Decomposition Validation Auto-discovery tools Data collection templates Interviews Workshops Working Sessions Reviews Implement Current / Future State Gap Analysis Plan Design Confidential

  34. The Application Decomposition interview process is performed repetitively for each application in the selected portfolio Develop Architecture Stack Develop Stack Restrictions Collect and Validate Template Data • Conduct Application Decomposition Interviews • Interview Application technical owners • Record all information in Application Decomposition Templates • Validate Application Decomposition • Validate interview data against Guiding Principles and Architecture Stack • Review, assess and add to the Application to Architecture Stack Mapping • Review, assess, and add to the Business Process Flows and Interdependencies • Review, assess, and add to the Business Activity Flows and Interdependencies (Business Substrate Mapping) • Record all validated information in Application Decomposition Database Application Decomposition Future State Start Review Decomposition Validation Auto-discovery tools Data collection templates Interviews Workshops Working Sessions Reviews Implement Current / Future State Gap Analysis Plan Design Confidential

  35. Application Decomposition output is vital for availability and recoverability enhancement projects Develop Architecture Stack Develop Stack Restrictions Collect and Validate Template Data • Gap Analysis • Identify gaps in current state Availability and Recoverability compared to capabilities • Identify gaps in future state Availability and Recoverability compared to capabilities • Identify current and future design gaps for goal achievement or enhancement • Design • Develop solutions for Availability gap closure • Develop solutions for Recoverability gap closure • Plan, Implement, and Review design • Repeat process until required Future State is achieved Application Decomposition Future State Start Review Decomposition Validation Auto-discovery tools Data collection templates Interviews Workshops Working Sessions Reviews Implement Current / Future State Gap Analysis Plan Design Confidential

  36. Personal Markets View Policy Data Save Customer Data Search Vision Data CSW CSW CSW PCA A complete Application Decomposition project provides valuable documentation to support current activities and simplify future design efforts Examples Business Function to Application Map • For each Line of Business, a mapping of applications to Business Functions • Provides graphical data to translate end user terminology into IT lingo Business Function to Infrastructure Map • For each Line of Business, a mapping of Business Functions to applications and vertically through the Infrastructure • Provides the most complete picture of Infrastructure requirements Confidential

  37. The Quality Assurance environment should mirror the Production environment in Hardware and Software 6 • Finding • An Outage occurred on August 8, 2004 in the WAS environment • Most of the outage would have been avoided if the Quality Assurance environment had the same software configuration as the Production environment • Recommendation • Make the Quality Assurance Environment a Mirror of the Production Environment • Benefits • Will allow discovery of software mismatches during the testing phase of application development • Better assurance production changes have been tested in a production like environment Confidential

  38. The Quality Assurance environment should be scalable to the Production environment in Hardware and Software. Change Management should be utilized to guarantee matching environments. Description Quality Assurance Environment Production Environment F5 F5 Load Balancing Mechanisms Should Mimic Production QA and Production servers should be of similar architecture Server Server Server Server Server Server Operating Systems should match production configuration, version and patch level OS OS OS OS OS OS Layered Software Layered Software Layered Software Layered Software Layered Software Layered Software Layered application and monitoring software should match production exactly Monitoring Software Monitoring Software Monitoring Software Monitoring Software Monitoring Software Monitoring Software Directory and security services should be representative of production systems Security Directory Security Directory Data Source Data Source Data Source Data Source Data Source Database versions and schema should mirror production Confidential

  39. To implement a Quality Assurance environment that mirrors Production, the following high level steps should be performed • High Level Implementation Consideration • Ensure servers in the QA environment are of similar architecture as the expect target Production servers • Ensure all Operating Systems used in QA environment match the Production equivalents in configuration, version, and patch levels • Ensure all layered application and monitoring software are exact matches to the Production environment • Ensure the load balancing mechanisms in the QA environment mimic the Production environment • Ensure directory and security services in the QA environment are representative of Production systems • Ensure database versions and schema in the QA environment mirror production • Establish Change Management procedures to guarantee matching environments Confidential

  40. Centralized Logging 7 • Finding • There is no standardized/centralized notification/alerting/logging system in place to collect, correlate and analyze system events • Recommendation • Institute a centralized logging/event screening and notification project to collect and transmit server and network event messages to a dedicated monitoring server with SQL database for the purposes of persistent storage ,event notification and centralization of log record management • Benefits • Critical messages can be routed in near real time to the appropriate resource for immediate action • Messages will be cataloged for future reference • Statistical analysis of errors and events Confidential

  41. Central Logging Architecture and Data Flow from an event log entry through the creation of the variable format output Description Client Logging Server Output sendmail Email Problem Ticket Event Log SWATCH Filter script Page syslog-ng PHP syslog Apache HTML MySQL Local Log file Pro Active Net Central Log file Confidential

  42. High Level Implementation Consideration Analyze Selected Event A process should be established to identify and analyze logging requirements as they are identified or introduced into the environment. Analysis should define events to be collected, routing of messages, data persistency, display and notification requirements. Validate or create client routing Every event message is assigned a level of importance consisting of syslog facility and priority codes. Client configuration files should be adjusted accordingly to simply log or log and forward selected events. Analyze Selected Event Future State Start Deploy Validate syslogd config on client Auto-discovery tools Data collection templates Interviews Workshops Working Sessions Reviews Test Validate syslog-ng config on server Create SWATCH filter Select notification mechanism To implement a Centralized Logging, Event Screening and Notification System, the following high level steps should be performed Confidential

  43. High Level Implementation (continued) Validate or create central logging definition Log definitions perform log routing inside syslog-ng. They are composed of a source definition, a filter and a destination definition. Source: Any or all local or remote logging clients Filter: Regular Expression matching string constructed to detect patterns such as message source, severity or message content Destination: Message or host specific log file, specific user or group, database or any combination Analyze Selected Event Future State Start Deploy Validate syslogd config on client Test Validate syslog-ng config on server Create SWATCH filter Select notification mechanism To implement a Centralized Logging, Event Screening and Notification System, the following high level steps should be performed (continued) Confidential

  44. High Level Implementation (continued) Select Notification or Action Mechanism Any or all of: Email Console display Text to pager or cell phone Script or other program capable of automatic recovery or opening a problem ticket. Create Swatch Filter SWATCH is a perl utility that provides automated filtering of messages fed to it by the syslog-ng process. It is capable of very granular filtering using user defined pattern matching by regular expression. Once a pattern is matched SWATCH can execute the appropriate Notification or Action mechanism. Analyze Selected Event Future State Start Deploy Validate syslogd config on client Test Validate syslog-ng config on server Create SWATCH filter Select notification mechanism To implement a Centralized Logging, Event Screening and Notification System, the following high level steps should be performed (continued) Confidential

  45. High Level Implementation (continued) Test New and modified sensors should be evaluated in a suitable test environment prior to deployment Deploy Modified configuration files should be methodically deployed to production. Careful consideration should be given to resource consumption introduced by changes. An authoritative copy of the configuration files should be transferred to write protected medium to protect it from being compromised. Analyze Once the alert sensor is in place it should be monitored regularly to ensure it behaves as intended. Application or System changes may affect the format or volume of event messages. Log files and databases should be monitored for resource consumption after establishing the initial configuration. Analyze Selected Event Future State Start Deploy Validate syslogd config on client Test Validate syslog-ng config on server Create SWATCH filter Select notification mechanism To implement a Centralized Logging, Event Screening and Notification System, the following high level steps should be performed (continued) Confidential

  46. Intelligent Application Switching 8 • Finding • Load balancing mechanism is not application aware to detect and reroute away from downstream application failures • Recommendations • Consider Intelligent Application switching to detect content failure and traffic congestion and reroute traffic accordingly • Conduct a review of the application and integration architecture to identify areas in need of simplification and applicability of Leading Practice • Benefits • Reduced downtime and improved service levels • Speeds recognition and isolation of failed services • Flexible provisioning of services Confidential

  47. Application Transport Network Interface Hardware Protocol Stack Intelligent Load Balancing and The Protocol Stack Description Current State • All Network devices route traffic based on the protocol stack • Existing Liberty Mutual load balancers route traffic according to the Network layer address and Application port of the next hop. • As long as the next service tier is available, the traffic will be forwarded. • Failures higher up in the protocol stack go undetected. Recommended State • With application Layer Switching, Network traffic can be classified by • Application • Purpose • Source & Destination An intelligent switch can employ various enterprise policies specifying how to handle traffic Confidential

  48. Intelligent Load Balancing – Current State Description (continued) Service Available FailedApplication Load Balancer forwards user requests to any available Web Server Downstream Application Failure undetected by Front End Load Balancer – User request fails Confidential

  49. Intelligent Load Balancing – Future State Description (continued) Front End Application Tier Web Tier FailedApplication Application Load Balancer detects downstream failure and forwards user request via specific path to available resource Downstream Application Failure bypassed – User experience is unaffected Confidential

  50. Application load patterns are affected by both internally (application) and externally (user and infrastructure) induced causes. An effective load balancing scheme should recognize these variables and be specifically tuned to work in the Liberty Mutual environment Implementation • Document Logical Architecture • Web service tier • Application service tier • Naming service tier • Data service tier • Document Physical Architecture • Cable plan • Routers and switches • Servers • Map Logical Architecture to Physical Architecture • Document IP Services • How these new IP services work • What benefit they provide • Availability strategies Confidential

More Related