300 likes | 415 Views
CMDBs: Above and Beyond…. Sarah Nadi CS 446 – Nov. 26 th , 2009. Overview. This work is in collaboration with CA Labs Canada. This is part of the work done in the Software Architecture Group (SWAG). Outline. Importance of proper IT management.
E N D
CMDBs: Above and Beyond… Sarah Nadi CS 446 – Nov. 26th, 2009
Overview • This work is in collaboration with CA Labs Canada. • This is part of the work done in the Software Architecture Group (SWAG).
Outline • Importance of proper IT management. • What is a Configuration Management Database (CMDB)? • Use cases of a CMDB. • What are root cause analysis and change impact analysis? • Example work done on root cause analysis and change impact analysis.
Enterprise IT Management • Information Technology (IT) systems are the basis of most business services today. • When something goes wrong with an IT system, companies face financial losses. • Therefore, Enterprise IT Management (EITM) has been lately gaining a lot of interest. • Reactively, IT analysts should be able to quickly locate the underlying cause of a problem (Root Cause Analysis). • Proactively, IT analysts should be able to identify the impacts of changes to the system to prevent unforeseen problems (Change Impact Analysis).
Configuration Management Database • A Configuration Management Database (CMDB) stores the information about the different components of an IT system. It contains details about the attributes and history of each Configuration Item (CI), details of the relationships between CIs, their problem and change history. • A configuration item (CI) is any component of an IT infrastructure. It can be software, hardware, services etc. • Usually, business critical components are included as CIs in the CMDB. • The information in a CMDB provides a basis for root cause analysis and change impact analysis.
Configuration Management Databases (CMDB) (The big picture) Calls Uses Service Z X Service Y Service X Forwards Provides IT System IT Analyst
CMDB Has Provides More Details Information
Configuration Items (CIs) CMDB Relationships
Knowledge contained in a CMDB • Hardware, networks, locations, etc. • Software, SLA, versioning information. • Reporting structures, contacts, organization. • Dependencies between components. • Incident and Problem reports. • History of changes made, by who, when, why, etc. • Gateway to logs and diagnostics.
Seven use cases of a CMDB [1] • What can the data in a CMDB be used for? • Change Impact Analysis. • Change Governance. • Root Cause Analysis. • Auditing and Compliance. • Resource Optimization. • Services Mapping. • Services Performance Planning.
? Root Cause Analysis ? X Analyst ? ? ? X ? X ? ? X ?
Root Cause Analysis • A fault is a design flaw or malfunction that causes a failure of one or more CIs or IT services. • A failure is the loss of ability to operate to specification, or to deliver the required output. • An incident is an observed event that is not part of the standard operation of a service and that causes, or may cause, an interruption to, or a reduction in, the quality of that service. • Root cause analysis tries to map an incident to its underlying fault.
“What if” analysis. Find impact of a proposed change. Change Impact Analysis ? ? ? Upgrade Oracle 10g to Oracle 11g ?
A change set is the set of CIs that need to be changed for a change to be correctly implemented with no side effects. Unplanned changes can cause unexpected problems in related CIs Identifying the correct change set is very important. Accurate change impact analysis proactively prevents future incidents. Change Impact Analysis
Importance of root cause analysis & change impact analysis • Reactively, IT analysts should be able to quickly locate the underlying cause of a problem (root cause analysis). • Proactively, IT analysts should be able to identify the impacts of changes to the system to prevent unforeseen problems (change impact analysis). • Root cause analysis and change impact analysis are, therefore, important IT management activities that prevent costly IT outages.
Challenges • Identifying the information needed to perform root cause analysis and change impact analysis. • Finding the best way to model this information. • Providing practical and useful solutions.
Our Proposed Solution: DRACA • A Decision Support framework for Root Cause Analysis and Change Impact Analysis. • Given the CI involved in an incident, DRACA provides a list of ranked CIs who are suspect root causes (root cause analysis). • Given an initial CI to change, DRACA provides a list of ranked CIs who should be changed as well (change impact analysis).
Info. needed for root cause analysis • Existing dependencies in the CMDB. • Previous incident reports, problem reports, and change reports. • Calendar information. • CI Change Times.
Root cause matrix • Rij is the probability that i is the root cause of the incident in j
Info. needed for change impact analysis • Historical change sets • CIs that have changed simultaneously in the past are likely to change again together in the future. • Understanding previous change sets can help identify future ones.
Empirical Work • Tested our technique on industrial data from CA’s Global Information Systems (GIS) team that manage CA’s internal network and services. • We used 3 years worth of data to test our prediction model. • Our results were promising and we were able to correctly predict a large percentage of the change sets.
Summary • Proper IT management is very important to minimize IT systems’ disruptions. • A CMDB supports IT management by tracking business critical CIs. • Root cause analysis and change impact analysis are two important process in IT management. • Root cause analysis involves finding the original cause of a problem. • Change impact analysis involves finding the set of CIs that might be affected by a proposed change.
References [1] Messineo, David A & Ryder, Macolm. Why Implement a Configuration Management Database (CMDB)? Seven Fundamental Use Cases. CA White Paper, 2008.]