1 / 41

The Application of Causal Analysis Techniques for Computer-Related Mishaps

The Application of Causal Analysis Techniques for Computer-Related Mishaps. Chris Johnson University of Glasgow, Scotland. http://www.dcs.gla.ac.uk/~johnson SAFECOMP: 26 th September 2003. Acknowledgements. HSE: Mark Bowell, Ray Ward.

jaeger
Download Presentation

The Application of Causal Analysis Techniques for Computer-Related Mishaps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Application of Causal Analysis Techniques for Computer-Related Mishaps Chris Johnson University of Glasgow, Scotland. http://www.dcs.gla.ac.uk/~johnson SAFECOMP: 26th September 2003

  2. Acknowledgements • HSE: Mark Bowell, Ray Ward. • Adelard: George Clelland, Peter Bishop, Luke Emmett, Sofia Guerra, Robin Bloomfield. • Blacksafe Consulting: Bill Black. • Glasgow University: Chris Johnson. Look, I’m not blaming you, I’m just suing you…

  3. Bias • Author bias: • individuals reluctant to accept findings they did not produce. • Confidence bias: • people trust those with most confidence in their techniques. • Hindesight bias: • investigators use information unavailable to people in incident. • Judgement bias: • investigators reach decision within a constrained time period. • Political bias: • high status member has influence by status not judgement itself… “At this point in the meeting, I’d like to shift the blame from me onto someone else…”

  4. bad good

  5. Does this really look like me? Fish accidents? The Sunday Telegraph, September 7th, 2003, page 33.

  6. “The NASA Accident Investigation Team investigated the accident using “fault trees,” a common organizational tool in systems engineering. Fault trees are graphical representations of every conceivable sequence of events that could cause a system to fail.” (CAIB, p.85)

  7. “The NASA Accident Investigation Team investigated the accident using “fault trees,” a common organizational tool in systems engineering. Fault trees are graphical representations of every conceivable sequence of events that could cause a system to fail.” (CAIB, p.85) • But…Fault Trees: • not good for event sequences (poor notion of time); • few engineers would agree with “every conceivable”? • * work with Clif Ericsson at Boeing on Accident Fault Trees *

  8. Control system closes valve A, starves debutanizer. Also closes valve B, heating raises debutanizer pressure. Opens valve A, debutanizer flow restored. Valve B should open to splitter. Operators see misleading signals, valve B shown open. Debutanizer fills while naptha splitter empties.

  9. Motivation: Milford Haven Separate displays. Didnt check status of valve B, operators open valve C. Debutanizer vents to flare, wet gas compressor restarts. Should increase flow but increases debutanizer pressure. Material vents to flare drum, corroded discharge breaks. 20 tonnes of hydrocarbon ignites, damage > £50 million.

  10. Motivation: Milford Haven • Human ‘Error’ and Plant Design/Operation “Operators were not provided with information systems configured to help them identify the root cause of such problems. Secondly, the preparation of shift operators and supervisors for dealing with a sustained upset and therefore stressful situation was inadequate. • Safety Management Systems “… the company’s crucial safety management systems were not adequately performing their function. Examples are the systems for modification and inspection. Company was unaware of defects in safety management systems because its monitoring of their performance did not effective highlight problems.” • Risk Assessment “…3 years before a modification was carried out so automated high-capacity discharge pumps no longer automatically started to move excess to slops from flare discharge tank. Instead, low capacity pumps recycle material back to production process. Valves had to be operated manually if high-capacity pumping to slops needed but this was seldom (never?) practiced”.

  11. Elicitation and Reconstruction

  12. Tier Analysis: JPL

  13. Flow Charts

  14. MORT Sub-tree: Management LTA MORT (Stage 2) Analysis Form

  15. Anaesthesia study: 15 incidents: 78 root causes (5.2 ave); 27% organisational causes; 40% (direct) human causes; 26% technical causes. A&E study: 19 incidents: 93 root causes (4.9 ave); 45% organisational causes; 41% (direct) human causes. PRISMA

  16. Example PRISMA Classification/Action Matrix

  17. Accident Models

  18. General Failure Types: Hardware Maintenance management. Design. Operating procedures. Error-enforcing conditions. Housekeeping Incompatible goals Communication Organisation Training Defence planning TRIPOD

  19. Control Flaws • 1. Inadequate Enforcements of Constraints (Control Actions) • 1.1 Unidentified hazards • 1.2 Inappropriate, ineffective or missing control actions for identified hazards • 1.2.1 Design of control algorithm (process) does not enforce constraints • Flaws in creation process • Process changes without appropriate change in control algorithm (asynchronous evolution) • Incorrect modification or adaptation. • 1.2.2 Process models inconsistent, incomplete or incorrect (lack of linkup) • Flaws in creation process • Flaws in updating process (asynchronous evolution) • Time lags and measurement inaccuracies not accounted for • 1.2.3 Inadequate coordination among controllers and decision makers • 2 Inadequate Execution of Control Action • 2.1 Communication flaw • 2.2 Inadequate actuator operation • 2.3 Time lag • 3 Inadequate or Missing Feedback • 3.1 Not provided in system design • 3.2 Communication flow • 3.3 Time lag • 3.4 Inadequate sensor operation (incorrect or no information provided)

  20. Argumentation Techniques

  21. Conclusions • Several classes of causal analysis techniques for E/E/PES: • Elicitation Techniques (e.g., Barrier Analysis); • Event-based techniques (e.g., Accident fault trees); • Flow Charts (e.g., PRISMA); • Accident Models (e.g., control theory models in STAMP); • Argumentation Techniques (e.g., counterfactual WBA). • How do we assess them? • investment, (i.e., training and time required to apply them); • consistency of individuals applying approach to same incident. • degree of support for recommendations/redesign?

  22. Conclusions • Can technique analyze failures at every stage of E/E/PES development? • Need to identify all candidate stages of development…. • Assess techniques against IEC 61508 development model. • Other standards/models might have been used. • Begin with subjective assessments + peer review (NTSB and NASA). • Currently validating against industrial experience. • Methodological problems (who has used more than 2 techniques?).

  23. Questions

More Related