320 likes | 580 Views
Performance Management Framework. ROC Retreat 6/17/2004. Topics. Overview – What is it, Why have it & Where has it been, Where does it fit? Fundamentals – What are the underlying principles and metrics? Architecture – Data collection (Observation, Analysis, Action)
E N D
Performance Management Framework ROC Retreat 6/17/2004 USBancorp
Topics • Overview – What is it, Why have it & Where has it been, Where does it fit? • Fundamentals – What are the underlying principles and metrics? • Architecture – Data collection (Observation, Analysis, Action) • Instrumentation – How are performance measurements collected? • Analysis and Reporting – How are events analyzed & What are the reporting patterns? • Alerts, Leveraging ROC Concepts and Next Steps - USBancorp
Overview - What is it?, Performance Engineering and Management is the ability to ensure that applications will be designed to meet their response time and throughput requirements, and when in production continue to do so. USBancorp
Fundamentals Concepts and Metrics Infrastructure Net. Facilities Resource Application Load/Usage Users Performance OS DB Methods Quality Response Time Stability Net.Components USBancorp
Instrumentation • Measurements • No Load (SoftProbes): • Intrusive vs. Non Intrusive • Passive vs. Active • Data Mining • Application Logs • System Logs USBancorp
Charts Alarms Security/Audit Analysis Real-time Reporter Application GUI Methods Appl. Server Methods ORB Interface & Methods Performance Analysis Repository Perf Framework Application Methods Database Methods Architecture • Throughout the development lifecycle, performance statistics are asynchronously collected, analyzed and used to influence design and implementation decisions. Training (Alert Rules) USBancorp
Instrumentation Intrusive Application Instrumentation (Performance Framework) • GUI • Business context, user, workstation, method, class • Application rules • Specific method invocations • 3rd party/calls to external modules and/or systems • Data base access • Selects, inserts, update, deletes, stored procedures, DDL commands USBancorp
Instrumentation Performance Framework Class Structure (Highlighting Alert Pattern Classes) USBancorp
class server user method JVM layer event environment application thread status Performance Repository • Data Model Event attributes: Begin timestamp End timestamp Latency (ms) Session_id USBancorp
Performance Framework Benchmark Stats • Response time: • Rt_1 felt by application • Rt_2 internal posting USBancorp
Analysis and Reports USBancorp
Analysis and Reports Transaction Summary (10 bucket report) USBancorp
Analysis and Reports If you cannot see the problem…you cannot fix it! Arrival Rates and Response Times Arrival Rate and Concurrency USBancorp
Analysis and Reports Denial of Service Attack (Day 1) USBancorp
Analysis and Reports Denial of Service Attack (Day 2) USBancorp
What are Alerts and why do we need them? • The ability for an application to assess when it can’t perform its functions correctly or to meet service levels, and then…report the failures to “someone who cares”. • If an application is sick and can’t perform some or all of its functions, what is a better way (fast and precise) to be notified than having the application “tell you” exactly what’s wrong. You need to walk before you can run USBancorp
Today’s Application Alert Architecture (outside looking in) • Device Monitors for Servers via OpenView • Application http/https Monitors via ISM Device Monitoring OV Event driven Applications and Infrastructure Devices CIC Application Monitoring (ISM) polling USBancorp
PANIC Tomorrow’s Alert Architecture (inside looking out) • Application Problem Determination as Presented in 2002/2003 Device Monitoring OV Event driven Application Alerts via OV using SNMP traps Applications and Infrastructure Devices CIC Application Monitoring (ISM) polling USBancorp
Tomorrow’s Alert Architecture (inside looking out) • Application Problem Determination as suggested using what’s in place today! Device Monitoring OV Event driven Root Cause Analysis Application Alerts via OV Applications and Infrastructure Devices CIC Corrective action Application Monitoring (ISM) polling USBancorp
Charts Alarms Security/Audit Analysis Real-time Reporter Application GUI Methods Appl. Server Methods ORB Interface & Methods Performance Analysis Repository Perf Framework Application Methods Database Methods Architecture • Throughout the development lifecycle, performance statistics are asynchronously collected, analyzed and used to influence design and implementation decisions. Training (alert Rules) USBancorp
Performance Framework’s Alert Functionality • Performance Framework has been integrated into the bank’s Application-WebSphere Framework so that all applications that use it are being monitored for response time, throughput, quality and stability. The Performance Framework is currently being re-written for the bank’s Application-.NET Framework. • The main purpose of the performance framework is to instrument applications so that performance related statistics can be measured and subsequently analyzed. As a by-product of data collection, real time analysis software was added in 2002 to identify when the target application is not functioning as designed or within performance tolerances. • We chose not to implement it in 2002/2003 because the bank’s implementation of the problem management software was not sophisticated enough to assist in “root cause analysis” prior to involving an operator. Raw alerts would have overwhelmed the problem management process at the CIC. USBancorp
Performance Framework’s Functionality • Alert Types: • Response Time – Are transactions (method invocations) meeting there expected latency? • Stability (throughput) – Is the Application processing transactions at the expected volume and throughput? • Quality – Are the transactions (method invocations) error free, if not what are the errors? USBancorp
Performance Framework’s Functionality • Alert Rule (Attributes) • ###FORMAT#### • #Alarm_Name (String: any unique label) • #Alarm_Type (String: Latency, error, stability) • #Alarm_Layer (integer: layer identifier) • #Alarm_Method_Name (String: Optional) • #Alarm_Days (N1-N2 - where Sunday = 1 and Saturday = 7) • #Alarm_Times (t1-t2 - range: 1-24 hrs) • #Sample_Size (>1, <=15) • #Alarm_Threshold (integer - this will be used as min Arrivals for Stability Alarms) • #Alarm_Forgiveness (integer) • #Alarm_Message (Text, white space allowed) • #### • #The Format will be in order (top to bottom) delimited with a comma • stab_1,stability,1,null,1-7,1-24,3,101,0.0,This is a stability alarm • err_1,error,1,null,2-6,7-20,2,0,5.0,Errors Greater than 5 pct USBancorp
Performance Framework’s Alert Functionality • Alert Types Architecture Layers USBancorp
Components • Likely Black Box Mapping USBancorp
JVM JVM Application Application Layer Layer EJB EJB Class Class Method Method Analysis Needs to Collaborate Between Components USBancorp
What is needed to use it? • Real-time Alert Analysis • Collaboration between Rule-Types • Is a stability alert real, or is it the by-product of a latency problem? • Is a response time alert real, or is it the by-product of an error alert? • Are any latency or stability alerts real, or are they pointing in the direction of the root cause? USBancorp
What is needed to use it? • Real-time Alert Analysis (continued) • Collaboration between components • Multi column applications need to isolate underlying infrastructure failures. When a subset of app columns deliver slow response time is the mutual failure in the network or the mainframe? • When a group of applications deliver slow response time while others are OK, is the mutual failure in the network or the mainframe, or …? USBancorp
USBank Framework USBank Framework CIC Other Services Other Services Alert Communication Alert Communication Application 1 Application 2 What is needed to use it? • USBank Application Framework services that supports Alert Analysis and Communication Root Cause Analysis USBancorp
Next Steps • Determine whether or not the Performance Framework’s approach is directional for problem determination. • Determine the requirements for a robust Alert Analysis and Recovery process. • Determine the role that the bank’s application framework should provide support services. • Determine whether to buy or build a strategic Analysis solution. USBancorp