370 likes | 496 Views
Advanced Management Technologies For Exchange 5.5 Greg Todd Program Manager NT Solutions Group BMC Software, Inc. Agenda. Current issues with problem diagnosis Application availability timeline Theory of root cause analysis (RCA) Primer on RCA How RCA can help you today
E N D
Advanced Management Technologies For Exchange 5.5 Greg ToddProgram ManagerNT Solutions GroupBMC Software, Inc.
Agenda • Current issues with problem diagnosis • Application availability timeline • Theory of root cause analysis (RCA) • Primer on RCA • How RCA can help you today • Demos of RCA on Exchange 5.5 • Systems management vision • Management maturity curve • The future of Exchange management
The Business Problem • Event automation #1 priority of IT executives • Problem diagnosis is a critical aspect that requires attention • Wasted Time80% of down time spent diagnosing20% of time spent fixing • Wasted ResourcesDiagnosis often a finger-pointing exercise • Frustrated UsersUsers have no idea what to expect Gartner, 1998
Point of Recovery Point of Notification Point of Diagnosis Point of Failure Point of Postmortem PoF PoN PoD PoR PoP Monitoring Analysis Recovery Evolution Application Availability Timeline
Application Violating Service Level Application Availability Timeline time PoF PoN PoD PoR PoP Root Cause Analysis Monitoring Recovery Evolution
Application Availability Timeline Application Violating Service LevelSignificant Decrease FasterServiceRestoration time PoF PoN PoD PoR PoP Root Cause Analysis Diagnosis Time Reduced Monitoring Recovery Evolution
Benefits Of RCA • Based on well-established theories • Quicker problem resolution • Problem isolation saves resources to address the real problem • Symptom filtering allows administrator to ignore sympathetic events • Performs tests to find the root cause • Far superior to rules-based approach • Key enabler to make systems self-sufficient • Provides impact analysis capability
RCAKey concepts • Symptoms are problems tobe investigated • Faults are the root causes ofthese symptoms • Tests are active tasks whichgather information RCA is a problem analysis methodology geared towards finding the real cause of a problem and preventing it from happening again.
Rules-Based Symptom received Possible causes looked up in afixed table of rules Set of possible causes presented to user Only suggestedactions can be provided to user Root Cause Analysis Symptom received Possible causes determined from a generic fault model Each cause is tested against suspects Actual root cause is presented to user after suspects are eliminated Specificactions can be provided to user Rules-Based Approach Vs. RCA
Root Cause AnalysisFor Exchange Server Three components that work synergistically Exchange Server Windows NT IP Network
High Level RCA Architecture EnterpriseConsole Mid-LevelManager ManagedNode ManagedNode ManagedNode
Managed Node Mid-Level Manager Bridge Managed Node ProtocolLayer ARB ARB Agent RequestBroker Javalink Bridge RTEP RTEP RealtimeEvent Proxy Managed Node Mid-level agent Custom ARB Other KM KM EnterpriseConsole Diagnostic KM Monitor KM KM RCA Engine RCA Architecture BMC PATROL Exchange Server and OS KMs
Root Cause AnalysisSample problem Remote Office Exchange Server Inbound Server T1 Link to Remote Office Exchange Server D Inbound Messages To Internet BridgeheadServer BridgeheadServer Firewall Exchange Server A Outbound Messages Exchange Server B Outbound Server Exchange Server C Legend Internal Mail Internal & Internet Mail Internet Mail
Queue Growth on Server A Queue Growth on Server B Queue Growth on Server C Queue Growth on Server D CPU Usage High Memory Bottlenecks MTA down on target machine Network Problem PATROL RCASample problem • Symptom received by model • Queue Growth Alarms from multiple Exchange Servers • Suspected root causes found in model
? ? ? ? CPU Usage High Memory Bottlenecks MTA down on target machine Network Problem CPU Usage High Memory Bottlenecks MTA down on target machine Network Problem PATROL RCASample problem • Suspected root causes tested • Root cause isolated • CPU usage high on bridgehead
Demo Simple RCA Scenario
Demo RCA Engine Causal Directed Graphs
Demo Root Cause Analysis Exchange, NT, IP Network
Demo Impact Analysis Exchange, NT, IP Network
Benefits Of RCA • Based on well-researched theories • Quicker problem resolution • Problem isolation saves resources to address the real problem • Symptom filtering allows administrator to ignore sympathetic events • Performs tests to find the root cause • Far superior to rules-based approach • Key enabler to make systems self-sufficient • Provides impact analysis capability
Systems Management Vision Where’s all this stuff going?
VIRTUALIZE STABILIZE CONTROL MANAGE MONITOR Phases Of Management Maturity Based on commonly known process control theory Applies directly to management of complex software systems
Maturity Phases MONITOR • Monitoring is plumbing • Included with Windows 2000 and Exchange 2000 • Server-centric data and event collection • Monitors component and system data • No awareness of other systems or apps • Basic alerting, scripting, and actions • WMI, PerfMon, HealthMon,Exchange 2000 monitoring
Maturity Phases MANAGE • Application-specific and server-centric • View and take action on components • Availability and performance monitoring • Rich reporting • Application SLA definition • ASAP resolution when out of compliance • Most correlation done in your head • Some tools have reached this level • Key enabler to Control phase
Maturity Phases CONTROL • Places system automation in control • Provides holistic view of systems • Enables high level of SLA compliance • Quick problem diagnosis • Action <--> Reaction • Proactive correction before users feel impact • Management automation maturing
Maturity Phases STABILIZE • Provides utility-level service • Reliable as electric, telephone, water • Assures continuous application service • Clusters • Built-in fault tolerance, re-routing, workload management • Failure does not impact service • Prediction / impact analysis • Awareness of impact on SLAs caused by planned changes
Maturity Phases VIRTUALIZE • The system learns how to intelligently deal with various issues • Automatic everything • Actions and responses for the IT group • Alerts and communications • Acquires and stores knowledge for future reference • Uses policy engines to control actions • Systems become truly self-sufficient • User becomes self-serviced
Virtualization ExampleProblem Research Assistant • Correlates problem root cause diagnoses with: • Previous resolutions - presents the user with previous remedies based on exact matches or best guess • On-line technical documentation - integrates with vendor-supplied support documentation (e.g. Microsoft Knowledge Base articles) • Technical Support Request Generator - formats required user information and diagnosed fault into a support request, according to vendor- specific templates
Virtualization ExampleProblem Research Assistant SupportRequests DiagnosedFaults Problem ResearchAssistant Correlation Backend Bridge Previous Resolutions Help RCA Server Domain Model Domain Model OnlineTechnicalArticles ProblemResponseHistoryRepository Domain Model IP Reachability Analyzer
RCA Takes Management To The Next Level VIRTUALIZE STABILIZE Many Players Many Choices CONTROL MANAGE RootCauseAnalysis MONITOR
Summary • GOAL: No interruptions in service • RCA is key to Exchange availability • Accelerates the diagnosis process • Can assess impact of failures before-hand • Not unreasonable to achieve “five 9’s” • RCA paves the way to virtualization • Managed systems that learn and adapt • You never have to intervene • Free to invest more time in pro-activity • RCA is in beta now!!
Call To Action • Demand sophistication and simplicity in Exchange management solutions • Solutions that learn • Solutions that are easy to use • Start thinking of Exchange availability in terms of utility-level service • Consider where to implement RCA in your current environment • Bring along those whom you service • Take care of your users • Communicate with them as you progress