140 likes | 159 Views
Explore BT's approach to managing end-to-end systems, application events, business processes, component monitoring, and more. Learn about BT's Matrix Architecture challenges, solutions for service design and operation, and how they align SLAs with business requirements.
E N D
BT – Managing Complex Systems Ian Johnston & John Palmer BCS Kingston & Croydon Branch presentation 26/02/08
Presentation Objectives • Approach to managing e2e systems • A standard for application events • Business process and component transaction monitoring • Order tracking and jeopardy • Leveraging the value of monitoring, eg. ASGs, Service and Capacity etc. • Managing COTS products eg BEA, Siebel
The BT experience • BT architecture – SOA – linked reusable capabilities • Our position has been driven from experience in monitoring of complex distributed architecture. • The concept of configuring toolsets to monitor e2e is unachievable for large enterprises – maintenance expensive/ impossible. • This has led us along the Design route which now parallels ITIL‘s Service Design concepts.
BT Matrix Architecture Challenges - Service Design • Service Level Management • SLAs aligned to business requirements • BT’s outsourcing strategy • Availability • Understanding CE requirements • Response times • Capacity Management • Accurate measurement of transaction volumes • Response times broken down by capability • IT Service Continuity management • Dynamic deployment in virtualised environments • Physical and geographic resilience • SLM • Defining measurements & targets, eg volumes, response times • Aligning SLAs with UCs • Capacity Management • Procedures to ensure customer targets are met • Business Continuity management • Deployment designs to ensure resilience • Availability management • Measure e2e availability broken down to capabilities
BT Matrix Architecture Challenges - Service Operation • Operational management • How to assess the impact and prioritise application events by business process and IT Service ? • Application management • Routing of PRs to the appropriate support groups? • Analysing high volumes of events in log files? • Technical management • Pinpointing root-cause across multiple shared capability • Metrics • Stepped changes in volumes, errors and response times? • Impact of changes eg trend in error rates • Measuring operational efficiency eg txns vs. failures
BT Matrix Architecture Challenges – E2E Design End Customer End Customer NB : incorporates Flow Stream / Manage / Monitor / Director From Create ServiceID “ SF – Provide – Progress - pt 1 ” ( Place Order ) Build Port Network Capacity Shortfall Into Error Get Tie Cable Mapping queue for manual processing Place Order Pending Pending Assigned ` ` ` Acknowledged Acknowledged ( SMPF ID ) ` ` ` Committed Committed Committed Build VC RADIUS , B - RAS , VCI , etc ` Installed Completed Completed Update ( SMPF ID , Installation DN etc ) SMPF ID Status = “Completed” ` Complete Activation email Status = “Completed” To To “ Close Order” “ Close Order” sub - process sub - process
BT Approach – Application event standard Business transaction Business Process Event type Time Application Standard Host Business keys server e2e correlation key Component capability
BT Matrix Architecture Solution - Service Design SLM • agile design workshop to build in measures to support SLAs Availability • Agile capability workshops to build in measures for monitoring of capacity implemented by apis • Standardised events for common error conditions such as interface failures IT Service Continuity • Dynamic reports of services and deployment profile (host/server distribution)
BT Matrix Architecture Solution- Service Operation Operational management • Event correlation (by service and transaction identifiers) • Impact (problem scenario and guided action) • Performance bottlenecks • Support group checklists (quick wins) Application management • Improved routing of PRs to the appropriate support groups provided by e2e view • We can we analyse high volumes of events by restricting the types of events and provision of summarisation Technical management • Diagnosis – root cause ( e2e location and standard error) Metrics • Summarisation and granularity inherent in standard
Outsourcing Supplier Contracts 1.Monthly views to identify any stepped changes in • Volumes, Response times, Error rates 2. Weekly views of top 5-10 transactions showing • Distribution of volumes, variance in response times, peaks and spikes • Any worsening trends in errors and thresholds 3. Monthly analysis of error messages showing • Volumes errors, eg aborts, application, business, etc. • Breakdown by business process, IT service and component transaction • Corresponding traps and CR/DRs using AlarmMis 4. Ad-hoc Investigations to review • Loadings and relative performance across servers • Real-time transaction analysis • Drill down diagnostics • COTS, platform and network root cause analysis 5. Service management process to review • Capacity • Supplier’s (eg Siebel, WLS) and applications development group’s CRs and DRs • PRs against remedial activities
What is the BT experience? Key messages • Define Standard for Application Events • Instrumentation by design built into matrix capabilities • Implementation by using agile design workshops • Exploitation of toolset supported by supplier contracts • Application monitoring standard promotes the effective problem management by integration with the enterprises diagnostic toolsets
Events Performance Hunter Integration Console System & Application Trap Definitions Management Frameworks COTS Monitoring definitions, e.g., Seibel, BEA, Oracle Remote Operation Business Process & Application txn Monitoring • Flexible & agile • Uses COTS out-of-the-box • Rapid development & deployment • Any management frameworks • Low maintenance