1 / 45

Advanced Event Management

Advanced Event Management. Advanced Event Management Tivoli User Group November 2011 Nick Lansdowne. Agenda. Initial Growth of an Event Management Solution Advancing the solution: Event Enrichment Message Catalogue Automated Escalation. First Steps.

gilda
Download Presentation

Advanced Event Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Event Management Advanced Event Management Tivoli User Group November 2011 Nick Lansdowne

  2. Agenda • Initial Growth of an Event Management Solution • Advancing the solution: • Event Enrichment • Message Catalogue • Automated Escalation

  3. First Steps • Growth of Event Management Solutions • Generate useful events • Base details: Hostname, Severity, Message • Event Type: Problem/resolution • Correlation information: Resource details • Noise Reduction: • De-duplication • Automated Clearing • Correlation of resolution to problem events Out-of-the-box with Netcool/OMNIbus

  4. Basic Architecture Operations Teams WebGUI Server Present Events Netcool/OMNIbusObjectServer Process Events Collect Management Information ITM Probes ITCAM ITNM

  5. Operations Team Event Response • Context • Who and what is affected? • Interpret • What does the event mean? • What response is required? • Inform • Who should respond to the incident? • How are these questions answered? • Personal Knowledge • SharePoint – Spreadsheets – Helpdesk system • CCMDB – On-call Rota – Knowledge Base

  6. Automate and Integrate • Context • Interpret • Inform

  7. Operations Team Response • Context • Who and what is affected? • Interpret • What does the event mean? • What response is required? • Inform • Who should respond to the incident? Enrichment Message Catalogue Automated Escalation

  8. Enrichment Who and what is affected?

  9. Who and what is affected? • Affected Service/Solution • Support Team, rota & contact details • Details for System Owner • Details for affected Customer(s)

  10. How to present information? Event Enrichment Drill Through Web-page Impact Operator View: Pros: Simple integration, dynamic data access Custom Web-page: Cons: Proprietary development • Probe rules: • Pros: Efficient, enriched as close to source • Cons: Static data, content • OMNIbus Automation: • Pros: Self-contained solution • Cons: Static data, Overhead on ObjectServer • Impact Policy • Pros: Dynamic data, ObjectServer overhead minimised • Cons: Impact Infrastructure

  11. Event Enrichment: Solution Basics Event received by ObjectServer and triggers an Impact Policy The Impact Policy identifies the node specific data in the CCMDB Additional data written to event in ObjectServer 1 1 2 3

  12. Netcool/Impact Implementation • Data Source definition: • Connection to the CCMDB database • Data Type definition: • Identifies the table or view within the RDBMS • Policy definition: • Describes how events are processed, and what data is appended to those events • Event Reader definition: • Identifies which events are to be processed and which policies will be applied to those events

  13. Data Source & Type

  14. Policy Definition

  15. Event Reader

  16. Event Enrichment

  17. Operator View: Solution Basics Event received by ObjectServer and displayed in AEL Operator initiates custom tool that launches to a Impact Operator View Operator view queries CCMDB for data to display 2 2 1 1 2 3

  18. Implementation • Data Source definition: • Connection to the CCMDB database • Data Type definition: • Identifies the table or view within the RDBMS • Operator View: • Describes what additional data is displayed along with the event details • WebGUI Tool: • Alerts tool to launch into Impact Operator View

  19. Data Source & Type

  20. Operator View

  21. WebGUI Tool http://impact511:9080/opview/displays/NCICLUSTER-NodeDetails.html?Node={@Node}&Severity={@Severity} &Summary={@Summary}&node={@Node}&Location={@Location}

  22. Launching Tool

  23. Message Catalogue What does the event mean? What response is required?

  24. What does the event mean? • Details: • Description • Source • Impact • Required Action • Owner • Data Access: • Enrichment • WebGUI Tool • Data Source: • File system • Database • Sharepoint • Wiki based solution

  25. Orb Data Message Catalogue • Orb Data solution: • Open source wiki engine (www.dokuwiki.org): • Benefits • All data is stored in plain text files (no database) • Centralised data – removes risk of obsolete documents and no distribution of document revisions • Pages can be quickly and easily created and updated • Page templates can be used to speed up page creation and promote standards • Supports images, PDFs, Word documents etc to supplement page content • Search capability to quickly find specific and related pages • Page revisions tracked and previous versions stored

  26. Example Entry

  27. Typical Implementation • URL Launched from WebGUI Tool: • URL derived from alert fields, for example: • AlertKey, Node, Identifier and populated via probe rule • Alternatively use Impact policy to reference external data source

  28. Extending the Message Catalogue • Use of HTML based pages: • capability to embed additional information • For example, email links • Embedded Operator Views • Ability to act on contained data

  29. Embedded Operator Views

  30. Automated Escalation Who should respond to the incident?

  31. Who should respond to the incident? • Automated Escalation: • Which events? • How? • Visual: Event Flash, Increment severity • External: email, SMS, page • When? • Rota hours • Who to? • Contact details for support team/on-call engineer • Requires: • Repository for Rota & Contact details • Automations

  32. How to escalate? OMNIbus Automation Impact Policy Repository: Integration to existing repositories Automation Impact Policies Pros & Cons Pros: No duplication of data, DSA Integration, ObjectServer overhead minimised Cons: Impact Infrastructure • Respository: • Custom ObjectServer Database • Automation: • Triggers • Procedures • Pros & Cons: • Pros: Self-contained solution • Cons: Duplicated data, Overhead on ObjectServer

  33. Escalation: Solution Basics Event received by ObjectServer and triggers an Impact Policy The Impact Policy retrieves escalation details from CCMDB & Helpdesk system Email sent to on-call engineer Update event to indicate escalation 3 1 1 2 4

  34. Impact Implementation • Data Source definition: • Connection to the CCMDB & Helpdesk databases • Data Type & Item definition: • Identifies and links the tables/views for the require external data • Policy definition: • Describes how events are processed, the data correlation between events and repositories and initiates the escalation • Event Reader definition: • Identifies which events are to be processed and which policies will be applied to those events

  35. Example Raw data Support Team Support Rota

  36. Data Item: Dynamic Link

  37. Policy: Dynamic Links

  38. Policy: Sending an email • Configure the EmailSender service • SMTP Server • Sending email address • Call the sendEmail function: • Including: Target email address, Subject, Message

  39. Policy: Sending an email

  40. Extending the Solution • Closed loop escalation: • Feedback from escalation • Mechanism for feedback? • Standard: Event Acknowledgement from WebGUI/Native Desktop • External: Email, SMS • External Solution • Derdackmessage master Enterprise Alert 2011

  41. Derdack Enterprise Alert 2011 • Automates escalation process: • Evaluation of event • Automated search for a responsible person or group • Communication and submission of escalation messages • Processing of delivery notification • Processing of responses • Communication back to the initiating system • Closed Loop Escalation via: • Email, SMS, Automated VOIP Calls, Smartphone Applications • Out-of the box Integrations: • ITM 6.2+, HP Operations Manager, SCOM • Custom Integration: • SOAP, HTTP (may be used for Netcool integration)

  42. Extended Architecture Operations Teams Closed Loop Escalation Event Details/ Tool Integration WebGUI Server Message Catalogue EA2011 Present Events Enrichment/ Escalation CCMDB Netcool/OMNIbusObjectServer Netcool/Impact Process Events Collect Management Information ITM Probes ITCAM ITNM

  43. Summary • Base infrastructure: • Event generation, deduplication, correlation • Event Enrichment: • Basic context for event • Message Catalogue: • Event details and escalation information • Automated Escalation: • Continuous 24x7 service

  44. Orb Data Services • Netcool • TEC to Netcool/OMNIbus Migration Review • Remote Rulebase Migration • Mobile Device Integration/Event Workflow Design • Meet SLAs with Automated out-of-hours escalations • Netcool/Impact – A Practical guide (workshop) • IBM Tivoli Monitoring • ITM 6.2.3 Migration • Predictive Performance and Capacity Monitoring • Custom Agent Development • SLA reporting using TDW • Monitoring MQ infrastructures • WebSphere application and infrastructure monitoring • Tivoli Automation • Implementing Security to minimise risk and overheads

  45. Questions?

More Related