250 likes | 706 Views
Incremental Detection and Visualization of Problem Patterns – a “Simplified” Symptomatic Event Vizualizer – Marcelo Perazolo Autonomic Computing Architecture mperazolo@us.ibm.com Abdi Salahshour Autonomic Computing Technology & Development abdis@us.ibm.com
E N D
Incremental Detection and Visualizationof Problem Patterns–a “Simplified” Symptomatic Event Vizualizer – • Marcelo Perazolo • Autonomic Computing • Architecture • mperazolo@us.ibm.com Abdi Salahshour Autonomic Computing Technology & Development abdis@us.ibm.com April 25-26, 2006
Agenda • Statement of Problem • What is the Common Event Format • What is the Symptoms Reference Format • A Solution • Conclusion • Helpful Links
Problems Facing Today's Data Collection • Complexity of e-Business • Collection of distributed and heterogeneous software and hardware components • Variety of Data and Collectors/Adapters • Consume and publish proprietary data formats • Require ad hoc and product specifics code • Data format and APIs • Design and Standards considerations • Different skills set to configure, maintain, and tune • Difficult to correlate for e2e problem diagnostics • Instrumentation • Many-to-Many • Standards compliance • Customer pain and cost of ownership
[ibm][db2][jcc][t4] 0150 0400162110E2C1D4 D7D3C5F140404040 ...!........@@@@ .....SAMPLE1 [ibm][db2][jcc][t4] 0160 4040404040404000 59D0030003005324 @@@@@@@.Y.....S$ ..}...... [ibm][db2][jcc][t4] 0170 0800640000003032 30303053514C5249 ..d...02000SQLRI .............<.. [ibm][db2][jcc][t4] 0180 4558540001000480 0100000000000000 EXT............. ................ [ibm][db2][jcc][t4] 0190 0000000000000000 0000000020202020 ............ ................ [ibm][db2][jcc][t4] 01A0 2020202020202000 1253414D504C4531 ..SAMPLE1 ...........(&<.. [ibm][db2][jcc][t4] 01B0 2020202020202020 20202000000000FF ..... ................ [ibm][db2][jcc][t4] [ibm][db2][jcc][ResultSetMetaData@108ac50a] BEGIN TRACE_RESULT_SET_META_DATA [ibm][db2][jcc][ResultSetMetaData@108ac50a] Result set meta data for statement Statement@2b2cc50a [ibm][db2][jcc][ResultSetMetaData@108ac50a] Number of result set columns: 1 isDescribed=true[ibm][db2][jcc][ResultSetMetaData@108ac50a] Column 1: { label=BALANCE, name=BALANCE, type name=DECIMAL, type=3, nullable=1, precision=9, scale=2, schema name=TEST , table name=ACCOUNTS, writable=false, sqlPrecision=9, sqlScale=2, sqlLength=0, sqlType=485, sqlCcsid=0, sqlName=BALANCE, sqlLabel=null, sqlUnnamed=0, sqlComment=null, sqludtxType=<null>, sqludtRdb=<null>, sqludtSchema=<null>, sqludtName=<null>, sqlxKeymem=0, sqlxGenerated=0, sqlxParmmode=0, sqlxCorname=ACCOUNTS, sqlxName=BALANCE, sqlxBasename=ACCOUNTS, sqlxUpdatable=0, sqlxSchema=TEST , sqlxRdbnam=SAMPLE1, internal type=3, is locator parameter=false } [ibm][db2][jcc][ResultSetMetaData@108ac50a] { sqldHold=0, sqldReturn=0, sqldScroll=0, sqldSensitive=0, sqldFcode=85, sqldKeytype=0, Event Logging source=com.ibm.ws.rsadapter.spi.WSRdbDataSource org=IBM prod=WebSphere component=Application Server <init> [11/25/03 14:14:33:695 EST] 42754514 > UOW= source=com.ibm.ws.rsadapter.DSConfigurationHelper org=IBM prod=WebSphere component=Application Server createDataStoreHelper parm1=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper parm2={} [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.GenericDataStoreHelper org=IBM prod=WebSphere component=Application Server init parm1=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper@2128451b [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.DataStoreHelperMetaData org=IBM prod=WebSphere component=Application Server setGetTypeMapSupport: false [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.DataStoreHelperMetaData org=IBM prod=WebSphere component=Application Server setHelperType: 0 [11/25/03 14:14:33:695 EST] 42754514 d UOW= source=com.ibm.websphere.rsadapter.CloudscapeDataStoreHelper org=IBM prod=WebSphere component=Application Server the cloudscape metadata is : parm1= The defaultTransactionIsolation is: 2 The supportsExtendedForUpdate is: false The supportsKerberos is: false The supportsSelectForUpdate is: true The supportsGetCatalog is: true The supportsGetTypeMap is: false The supportsIsReadOnly is: true The supporstMultiplePartitionDB is: false Applications Database Application Servers Servers Storage devices Networks Proprietary format
Problem determination may take days or weeks Blame Storming Blame Storming Syndrome • Proprietary log format • Domain specific set of tools • No interfaces between tools • Siloed problem determination • Finger pointing resolution Applications Database Application Servers Servers Storage devices Networks Proprietary format Specialized skills and tools
Common Base Event (CBE) / WSDM Event Format (WEF) • Richer and normalized data enables cross-product analysis & correlation; is a prerequisite to effective root cause analysis and automation • Without standards the event data are of little value to autonomic management in problem determination and action in response • To alleviate this event data are structured in 4 categories • The identification of the component that is affectedby or experienced the situation • This is also known as the source of a situation • The identification of the component that is reporting the situation • This is also known as the reporter of a situation • It may be the same as the source component of the situation • The situation data • Properties or attributes that describes the situations • The Context/Correlation data • Properties or attributes to correlate the situations with others • CBE / WEF • A consistent specification for the definition of normalized event and log information for various domains (business, security, network, system, etc.) • An exchange format for events and logs • Describe situations about the external operational capabilities of the component. • data that captures execution information within a component (i.e. trace), which CBE/WEF is not positioned for • Context Data
What is a Symptom? • Dictionary definition:“A characteristic sign or indication of the existence of something else.” • AC definition:“A characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources.” • A form of knowledge, used to solve problems and situations automatically in an autonomic system. • Symptoms are composite records of information, formed by the combination of raw or composite information into patterns • Symptoms may be composed of other symptoms as well
From Events to Symptoms • Event: an indication of something being monitored • For example, memory usage has exceeded a set limit • Symptom: a characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources • Symptom: If event x (and y (and…) ) occur (under certain conditions), then report the occurrence and possible resolution actions • For example, memory usage has exceeded a set limit three times in a 10-minute stretch: suggest increasing your buffer sizes
Symptoms Reference Architecture schema: <schema used to create a new instance of the symptom> metadata: <schema used to index and categorize all forms of knowledge> Policy Change Req Change Plan Analyze Plan Symptom Knowledge SymptomDefinition Monitor Execute Event rule effect: <schema that describes how to react to instances of the symptom> rule: <schema used to recognize a symptom instance> instance engine deploy engine: <a runtime artifact used to produce symptom instances> instance: <an instance of this symptom that conforms to the symptom schema> SymptomCatalog
The Value Proposition • Management Data more consumable to end-user • Visualization of product symptoms within problem determination tooling • Symptoms are more deterministic than individual events • Increased customer satisfaction • Reduced problem determination costs • Administrators use automated event correlation to recognize symptoms (and potentially, corrective actions) • Support personnel access symptoms directly from the problem determination tools • Cross-product symptom catalogs allow quick diagnosis for known errors • Reduced maintenance costs • Incremental improvements to symptom databases will reduce requests to L2 and L3 support • Reduced support requests from other IBM organizations • Standard symptom format allows products to leverage problem resolution cost from other IBM organizations (e.g. Collaboration Center)
One Tool Does Not Fit All! Advanced Developers LTA-eclipse LTA-portal Change Team Correlation Support Engineers System Analysts LTA-JD Analysis Operators Triage Basic (e.g. operators) Advanced (e.g. developers) Simple User Skills
“Simple“ Log and Trace Analyzer for Java Desktop • Standalone simple Java event viewer to merge, filter, sort, and display contents of event sources in a common event format (i.e., CBE) for problem isolation and triage to problem analysis • Enables end-to-end viewing of event sources across the heterogeneous environment • Customizable summary view • Ability to select and expand any raw from the summary view to display the full CBE attributes • Correlate on timestamp and/or sorting on any Common Base Event property • Filtering and multi level sorting of any event properties • Custom highlighting of triage events (simple symptoms definition) • Save and share configuration settings (import/export) • Staring point for Support personnel and Operation staff • Springboard to more advanced analyzer tools
Overall Architecture Fast XPath Process CBE CBE Event Sources Visual Filters • FastXPath • Integrates solution with existing code generation tools • Extracts XML schema-specific metadata from the object it queries • Uses metadata available in auto-generated classes to build optimized XSL engines
Event sources collection Customizable Results/Summary area Events detail area
= Equivalent toSymptom Rules This filter is by Creation Time using XPath that can be generated by the Filter Builder
Filter Builder (Novice Users) Powerful composition dialogs… … while still showing full XPath syntax for power users
= We associate visualization attributes to Symptom Rules
1 2 3 4 5
Flexibility to show only what the user wants to see: filters out the non-participating events
Symptom details (description of the problem) show up when hovering over the highlighted events
Helpful Links • Autonomic Computing Enablement Site • http://acenablement.raleigh.ibm.com/ • http://acenablement.raleigh.ibm.com/html/technology/pd/pddwnlds.html • Autonomic Computing • http://www.ibm.com/autonomic • Autonomic Computing Toolkit • http://www.ibm.com/developerworks/autonomic • Autonomic Computing Toolkit Download • http://www-106.ibm.com/developerworks/autonomic/probdet1.html • Common Base Event Version V1.0.1 (CBE) • http://dev.eclipse.org/viewcvs/indextools.cgi/~checkout~/hyades-home/docs/components/common_base_event/cbe101spec/CommonBaseEvent_SituationData_V1.0.1.pdf • WSDM Event Format V1.0 (WEF) • PART 1: http://docs.oasis-open.org/wsdm/2004/12/muws/cd-wsdm-muws-part1-1.0.pdf • PART 2: http://docs.oasis-open.org/wsdm/2004/12/muws/cd-wsdm-muws-part2-1.0.pdf • Common Event Infrastructure (CEI) • http://www.ibm.com/software/tivoli/features/cei/ • http://www-106.ibm.com/developerworks/library-combined/ac-cei
CBE Object ACT/XPath CEI/ESB CBE Logs XPath CBE Logs Import CBE Logs CBE XML Formatted Logs SymptomDB SymptomDB SymptomDB Solution Problem Isolation & Analysis Product Problem Isolation & Analysis Solution Problem Isolation Solution Problem Analysis Use Cases LTA-Eclipse (Correlate/Analyze) • Event viewing • Merge/sort/filter • Event correlation • Cross-Event analysis (symptoms) • Remote/local data collection • Event conversion CBE XML Log and Trace Analyzer Tools Retrieve and Analyze CBE Log Data RAC (API) CBE Events LTA-JD (Triage) LTA-JD (Analyze) Generic Log Adapters (GLA) Triaged CBE Events LTA-Portal (Correlate/Analyze) • Event viewing • Merge/sort/filter • Event correlation • Cross-Event analysis (symptoms) • Remote/local data collection • Event conversion CBE XML Formatted Logs • Event viewing • Merge/sort/filter • Single Event Analysis (highlighting/simple symptom rules) • local data collection • Remote data collection from CEI server Applications
LTA-JD Performance • Evaluation of LTA-JD end-to-end (xml input – convert & process object - filter – display) • Evaluation of simple FastXPath expression • /CommonBaseEvent[@severity >= '10'] on 100000 CBEs • FastXPath (157millisecs), JXPath (468 millisecs), Xalan (1328 secs) • Better results with • smarter filters • bigger JVM heap • IBM JDK 1.5 (~ 60% improvement !!!)