BT – Managing Complex Systems

BT – Managing Complex Systems Ian Johnston & John Palmer BCS Kingston & Croydon Branch presentation 26/02/08

Presentation Objectives • Approach to managing e2e systems • A standard for application events • Business process and component transaction monitoring • Order tracking and jeopardy • Leveraging the value of monitoring, eg. ASGs, Service and Capacity etc. • Managing COTS products eg BEA, Siebel

The BT experience • BT architecture – SOA – linked reusable capabilities • Our position has been driven from experience in monitoring of complex distributed architecture. • The concept of configuring toolsets to monitor e2e is unachievable for large enterprises – maintenance expensive/ impossible. • This has led us along the Design route which now parallels ITIL‘s Service Design concepts.

BT’s Matrix Architecture

BT Matrix Architecture Challenges - Service Design • Service Level Management • SLAs aligned to business requirements • BT’s outsourcing strategy • Availability • Understanding CE requirements • Response times • Capacity Management • Accurate measurement of transaction volumes • Response times broken down by capability • IT Service Continuity management • Dynamic deployment in virtualised environments • Physical and geographic resilience • SLM • Defining measurements & targets, eg volumes, response times • Aligning SLAs with UCs • Capacity Management • Procedures to ensure customer targets are met • Business Continuity management • Deployment designs to ensure resilience • Availability management • Measure e2e availability broken down to capabilities

BT Matrix Architecture Challenges - Service Operation • Operational management • How to assess the impact and prioritise application events by business process and IT Service ? • Application management • Routing of PRs to the appropriate support groups? • Analysing high volumes of events in log files? • Technical management • Pinpointing root-cause across multiple shared capability • Metrics • Stepped changes in volumes, errors and response times? • Impact of changes eg trend in error rates • Measuring operational efficiency eg txns vs. failures

BT Matrix Architecture Challenges – E2E Design End Customer End Customer NB : incorporates Flow Stream / Manage / Monitor / Director From Create ServiceID “ SF – Provide – Progress - pt 1 ” ( Place Order ) Build Port Network Capacity Shortfall Into Error Get Tie Cable Mapping queue for manual processing Place Order Pending Pending Assigned ` ` ` Acknowledged Acknowledged ( SMPF ID ) ` ` ` Committed Committed Committed Build VC RADIUS , B - RAS , VCI , etc ` Installed Completed Completed Update ( SMPF ID , Installation DN etc ) SMPF ID Status = “Completed” ` Complete Activation email Status = “Completed” To To “ Close Order” “ Close Order” sub - process sub - process

BT Approach – Application event standard Business transaction Business Process Event type Time Application Standard Host Business keys server e2e correlation key Component capability

BT Matrix Architecture Solution - Service Design SLM • agile design workshop to build in measures to support SLAs Availability • Agile capability workshops to build in measures for monitoring of capacity implemented by apis • Standardised events for common error conditions such as interface failures IT Service Continuity • Dynamic reports of services and deployment profile (host/server distribution)

BT Matrix Architecture Solution- Service Operation Operational management • Event correlation (by service and transaction identifiers) • Impact (problem scenario and guided action) • Performance bottlenecks • Support group checklists (quick wins) Application management • Improved routing of PRs to the appropriate support groups provided by e2e view • We can we analyse high volumes of events by restricting the types of events and provision of summarisation Technical management • Diagnosis – root cause ( e2e location and standard error) Metrics • Summarisation and granularity inherent in standard

BT Application Monitoring Standard

Outsourcing Supplier Contracts 1.Monthly views to identify any stepped changes in • Volumes, Response times, Error rates 2. Weekly views of top 5-10 transactions showing • Distribution of volumes, variance in response times, peaks and spikes • Any worsening trends in errors and thresholds 3. Monthly analysis of error messages showing • Volumes errors, eg aborts, application, business, etc. • Breakdown by business process, IT service and component transaction • Corresponding traps and CR/DRs using AlarmMis 4. Ad-hoc Investigations to review • Loadings and relative performance across servers • Real-time transaction analysis • Drill down diagnostics • COTS, platform and network root cause analysis 5. Service management process to review • Capacity • Supplier’s (eg Siebel, WLS) and applications development group’s CRs and DRs • PRs against remedial activities

What is the BT experience? Key messages • Define Standard for Application Events • Instrumentation by design built into matrix capabilities • Implementation by using agile design workshops • Exploitation of toolset supported by supplier contracts • Application monitoring standard promotes the effective problem management by integration with the enterprises diagnostic toolsets

Events Performance Hunter Integration Console System & Application Trap Definitions Management Frameworks COTS Monitoring definitions, e.g., Seibel, BEA, Oracle Remote Operation Business Process & Application txn Monitoring • Flexible & agile • Uses COTS out-of-the-box • Rapid development & deployment • Any management frameworks • Low maintenance

BT – Managing Complex Systems

BT – Managing Complex Systems

Presentation Transcript

A Systematic Approach to Managing Risk Using DFSS and DFMEA

Managing Information Extraction SIGMOD 2006 Tutorial

MANAGING MARKETING PERFORMANCE

How to Calculate the Mechanical Advantage of Hauling Systems

Software Quality Management : Managing the quality of the software process and products

Weeks 4-5: Internal Information Systems

Designing and Managing Fisheries Data Systems that Support the NOAA Data Quality Act: A Case Study Using the Hawaii Long

CHAPTER Creating and Managing Users and Groups

PART 7 : Managing Change

Real-Time Systems

Systems Analysis and Design 10 th Edition

Managing Morphologically Complex Languages in Information Retrieval

COMPLEX THINKING

MANAGING 4 I N N O V A T I O N

Managing Operations

Lesson 22 – Introduction to Linux Systems Administration

Systems Analysis and Design in a Changing World, Fourth Edition

Managing people

Lecture VI: Adaptive Systems

Agent Dynamics in Complex Multilevel Systems of Systems of Systems Jeffrey Johnson

Unit 3: Control Systems of the Human Body

Real-Time Systems