150 likes | 318 Views
Tony Parsons. Large Scale Event Management A Ph.D Thesis 30 th May 2007. Agenda. Introduction Scope of thesis Part 1 – Compaq/HP Development of solution Outcomes Part 2 – EDS … the work continues Early Results What next? Questions. Scope of thesis.
E N D
Tony Parsons Large Scale Event Management A Ph.D Thesis 30th May 2007
Agenda • Introduction • Scope of thesis • Part 1 – Compaq/HP • Development of solution • Outcomes • Part 2 – EDS … the work continues • Early Results • What next? • Questions
Scope of thesis • Event Management in system and network environment • What to do with events • How to do it • Where to do it • Produce an architecture & best practices guide • Fill in “blanks” where appropriate in commercial solutions • Started at Compaq then HP • Initial opportunity presented by Optus outsourcing win in 1998 • Commercial solutions largely determined by employer: Unicenter TNG then HP OpenView • Scope started as a technology-only but evolved into technology + process (ITIL Operations Management)
Part 1 : Compaq/HP • Initial work based on experiences in deploying large Unicenter TNG solution at Optus • Key component was Event Message Format • First generation solution
notification services event routing network monitor platform monitor services monitor network device web applications applications web services server appliance Event Management Architecture – Functional View workflow handling workflow handlingdata warehouse data warehouse central config repository event routingevent notification REGIONAL or GLOBAL Event configuration management event correlation & forwarding mid-level managerevent correlation, alarm creation & forwarding data aggregation Components aligned to services i.e. web hosting, SAP hosting, exchange, … REGIONAL or GLOBAL services data collection element manager event sourceevent reduction LOCAL power environment structure native devicemonitored targetevent source network appliance Server, Storage Network Applications LOCAL
Second & Third generation solutions • Impact of HP merger with Compaq • Introduction of CMDB • Introduction of ITIL processes • Creation of lightweight agent (SMSPI) • Switch from Unicenter to OpenView • Move to two global Operations Bridges in KL & Bratislava • Expansion to use by HP Services break/fix organisation
OVIS OVPI NNM ISEE backend OVO OVPI Poller OVIS Probe SNMP OVOA OVPA (MWA) SPIs Web Services Web Event Management Architecture – Technical View Reporting Data Warehouse WFM OVSD ETLs workflow handlingdata warehouse MSR DW OV Service Desk trap/event ETLs Automation bus poll events routing A.B. DECADE event routingevent notification REGIONAL or GLOBAL NNM specific SMSPI CMC OVO (many per region) mid-level managerevent correlation, alarm creation & forwarding REGIONAL or GLOBAL threshold alert syslog element manager event sourceevent reduction LOCAL CODA+OVPA ISEE agent smspi agent native devicemonitored objectevent source Syslog DSIs HP SIM agent radia client Network Server, Storage, Applications LOCAL
Outcomes from thesis • Event Management Architecture – Functional • Event Management Architecture – Technical • First based on Unicenter TNG and extensions (EMU) • Later based on HP OpenView and extensions (DECADE and SMSPI) • Event Message Standard • CA Awards for original Unicenter TNG deployment at Optus • US patents for event message format and event transport protocol • IBM Common Event Infrastructure (defined in 2004, not yet widely used even by Tivoli)
Results • By mid-2006 solution deployed to 55,000 servers and 64,000 network devices which were fully outsourced • Also deployed to 150,000 servers monitored on a break/fix basis • Target to reach 7 million servers by 2010 on break/fix • Operations Bridges reported a 85% automation rate • Operations Bridges identified a > 98% event consolidation/suppression rate
Event mgmt areas addressed • Filtering & Forwarding • Duplication, Detection & Suppression • Correlation • Problem & Clearing Event • Escalation • Root Cause • Cross-platform • Cross-host • Topology based • Event Synchronisation • Event Notification • Trouble Ticketing • Event Escalation • Maintenance • Automation • Event Architecture • Scaling • Resilience/High Availability • Event Classification • Event Visualisation • Event Simulation • Event Integration