270 likes | 775 Views
Direct Cause vs Root Cause “A Problem Solving Concept†INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department 12341, Weapon System and Software Quality. Presentation Objective.
E N D
Direct Cause vs Root Cause “A Problem Solving Concept” INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department 12341, Weapon System and Software Quality
Presentation Objective Events have many potential “causes”. We tend to think of “causes” as related mostly to “unwanted” events – but in effect, all events that occur have “causes” – that is, the reason that the event occurs. The objective of this short presentation/discussion is to gain a better understanding of why it is important to understand the difference between “direct” causes and “root” causes of events. In so doing, we enhance our capability to influence a much larger class of events – both in preventing unwanted events and ensuring wanted events actually do occur. Direct Cause vs Root Cause INCOSE Chapter Meeting
An Example of a Problem USAF F-22A jets grounded by software glitch <Jeremy Epstein <jepstein@webmethods.com>> Fri, 23 Feb 2007 15:55:52 -0500 Navigational systems failed, planes forced to return to Hawaii [visually having to follow their tankers to safety]. The problem turns out to be software (no surprise there). Fix created, "verified", installed, and they're off again. [Direct or Root Cause addressed?] A spokesman for Lockheed Martin this week insisted that the navigation software problem was minor. 'The issue was quickly identified in a matter of days and a fix installed in the airplanes, which were flown successfully to Japan,' he said. 'There are 87 of these exceptional fighters and they are out there performing exceptionally well, and their pilots continue to fly them in new and greater ways.'" Direct Cause vs Root Cause INCOSE Chapter Meeting
Examples to Test Our Understanding RESOURCE: http://catless.ncl.ac.uk/Risks Peter Neumann, Stanford University Professor RISK site provides a voluminous list of risks, many of which are computer/software related - primarily interested in security and safety risks; summaries are provided with links to more detail. • Army Training Accident, June 2002 • Friendly Fire Deaths, March 2002 • Medical “Direct/Root” Cause Determinations Direct Cause vs Root Cause INCOSE Chapter Meeting
A Simple Example Assume each of these factors is as described below: e: car will not start d: battery is dead c: alternator does not function b: alternator is well beyond its designed service life a: car is not being maintained according to recommended service schedule Direct Cause? Intermediary Causes? Root Cause? Direct Cause vs Root Cause INCOSE Chapter Meeting
Error, Fault/Defect, Failure may lead to may lead to may lead to ERROR FAULT/DEFECT FAILURE or NO FAILURE REDUCED EFFECT FAULT TOLERANCE • Error • a human action or lack of action that results in the inclusion of a fault in a product or the way it is used • the variance between expected and actual results • Fault/Defect • an accidental condition that causes a product to fail to perform its required function if encountered during operational use • Failure • an event in which a product does not perform a required function within its specified limits during operational use Direct Cause vs Root Cause INCOSE Chapter Meeting
Direct Cause • Causes of events may be natural or man-made, active or passive, initiating or permitting, obvious or hidden. • Those causes that lead immediately to the effect are often called direct or proximate causes. • Examples of direct/proximate causes: • Equipment Human • Arched • Pushed incorrect button • Leaked • Fell • Over-loaded • Dropped tool • Over-heated • Connected wires Direct Cause vs Root Cause INCOSE Chapter Meeting
Root Cause • Direct causes often result from another set of causes, which could be called intermediate causes, and these may be the result of still other causes. • When a chain of cause and effect is followed from a known end-state, back to an origin or starting point, root causes are found. • The process used to find root causes is called root cause analysis --- systematic problem solving. • A root cause is an initiating cause of a causal chain which leads to an outcome or effect of interest. Direct Cause vs Root Cause INCOSE Chapter Meeting
The Benefits of Problem Solving! • The usual purpose of attempting to find root causes is to solve a problem that has actually occurred, or to prevent a less serious problem from escalating to an unacceptable level (e.g., Near miss safety for aircraft). • The basic concept is that solving a problem by addressing root causes is ultimately more effective than merely addressing symptoms or direct causes. • That is, a “class” of problems may be solved/prevented by addressing root causes rather than just direct causes. Direct Cause vs Root Cause INCOSE Chapter Meeting
Basic Process - Continue to Ask Why! Continue to ask “why” until you have reached: • Direct, Intermediate, and Root cause(s) - including all organizational factors that exert control over the design, fabrication, development, maintenance, operation, and disposal of the system. • A problem/cause that is not correctable by your organization => may be promoted to higher responsible organization. • Insufficient data to continue. Direct Cause vs Root Cause INCOSE Chapter Meeting
Example Direct Cause vs Root Cause INCOSE Chapter Meeting
Why-Causal Tree Direct Cause vs Root Cause INCOSE Chapter Meeting
Example Direct Cause vs Root Cause INCOSE Chapter Meeting
Potential Problem Analysis Tools • Failure Modes and Effects Analysis (FMEA) • an inductive engineering technique used at the component level to define, identify, and eliminate known and/or potential failures, problems, and errors from the system, design, process, and/or service before they reach the customer • Fault Tree Analysis (FTA) • FTA is a deductive analytical technique of reliability and safety analyses and generally is used for complex dynamic systems • Probabilistic Risk Assessment (PRA) • PRA is a systematic, logical, and comprehensive discipline that uses tools like FMEA, FTA, Event Tree Analysis (ETA), Event Sequence Diagrams (ESD), Master Logic Diagrams (MLD), Reliability Block Diagrams (RBD), and so forth to quantify risk. Direct Cause vs Root Cause INCOSE Chapter Meeting
Summary • Direct Cause vs Root Cause • Issue: level of problem solving • Problem Solving • Direct Cause: objective is to solve an instance of a potential class of problems • Root Cause: objective is to solve a class of problems • Both are useful • Analysis Methods • Methods exist to analyze events – goal is to eliminate occurrence of unwanted events and ensure wanted events do occur • FMEA, FTA, PRA • Q&A? Direct Cause vs Root Cause INCOSE Chapter Meeting
Army Training Accident • Incident • Thu, 13 Jun 2002: two soldiers were killed in training at Ft Drum. They were firing artillery shells, and were relying on the output of the Advanced Field Artillery Tactical Data System. When they forgot to enter the target altitude, the system assumed an altitude of zero. (Ft Drum is 676 ft) • Direct Cause • Soldiers forgot to enter the target altitude • Potential Root Cause(s) • Software should not default to a valid altitude • Software/System analysis and modeling/testing inadequate • Software requirements not adequately specified • System CONOPS not adequate • Soldier training inadequate Direct Cause vs Root Cause INCOSE Chapter Meeting
Friendly Fire Deaths • Incident • A U.S. Special Forces air controller was calling in GPS positioning from some sort of battery-powered device. He had used the GPS receiver to calculate the latitude and longitude of the Taliban position in minutes and seconds for an airstrike by a Navy F/A-18. The bomber crew "required" a seconds calculation in degree decimals. The crew did not have equipment to perform the minutes-seconds conversion themselves. • The air controller had recorded the correct value in the GPS receiver when the battery died. Upon replacing the battery, he called in the degree-decimal position the unit was showing -- without realizing that the unit is set up to reset to its *own* position when the battery is replaced. • The 2,000-pound bomb landed on the air controller position, killing three Special Forces soldiers and injuring 20 others. • Direct Cause • Taliban position was incorrectly transmitted to the Navy F/A-18 bomber crew • Potential Root Cause(s) • GPS System Default was a valid not invalid position • Lack of battery backup to hold values in memory during battery replacement • Not equipping users to translate one coordinate system to another (reminiscent of the Mars Climate Orbiter slamming into the planet when ground crews confused English with metric) • Using a device with such flaws in a combat situation without adequate testing Direct Cause vs Root Cause INCOSE Chapter Meeting
Medical Direct/Root CauseExample 1 - Questions? Direct Cause vs Root Cause INCOSE Chapter Meeting
Medical Direct/Root CauseExample 2 - Questions? Direct Cause vs Root Cause INCOSE Chapter Meeting
Medical Direct/Root CauseExample 3 - Questions? Direct Cause vs Root Cause INCOSE Chapter Meeting
Medical Direct/Root CauseExample 4- Questions? Direct Cause vs Root Cause INCOSE Chapter Meeting