1.18k likes | 1.34k Views
High Reliability Operations. 4 Hour Professional Development Seminar HPRCT Workshop, Baltimore Maryland June 21, 2010 Richard S. Hartley, Ph.D., P.E. Janice N. Tolk, Ph.D., P.E. This presentation was produced under contract number DE-AC04-00AL66620 with.
E N D
High Reliability Operations 4 Hour Professional Development Seminar HPRCT Workshop, Baltimore Maryland June 21, 2010 Richard S. Hartley, Ph.D., P.E. Janice N. Tolk, Ph.D., P.E. This presentation was produced under contract number DE-AC04-00AL66620 with
What is a High Reliability Organization? • An organization that repeatedly accomplishes its mission while avoiding catastrophic events, despite significant hazards, dynamic tasks, time constraints, and complex technologies • A key attribute of being an HRO is to learn from the organization’s mistakes • A.K.A. a learning organization
Examples of High Reliability Organizations or Not? • Nuclear Navy • Commercial nuclear power • Aircraft carrier operations • Hospital patient care • Military nuclear deterrent • Forest service • Aviation • Nuclear weapons assembly and disassembly
Business Case for High Reliability Is it right for you?
Does a Systems Approach Make Sense? Department of Energy Safety Improvement from 1993-2008 Contractor ISM deployed Data as of 7/7/2009 DOE injury rates have come down significantly since Integrated Safety Management (ISM) was adopted
Does a Systems Approach Make Sense?U.S. Nuclear Industry Performance 1985-2008 Capacity Factor (% up) Cost (¢/kwh) Rx Trips/ Scrams Significant Events/Unit Nuclear Energy Institute (NEI) Data
The Alternate to the HRO The Normal Accident Organization
Feeling Comfortable with a Good Safety Stats? • As Columbia and Davis-Besse have demonstrated, great safety stats don’t equal real, tangible organizational safety. • The tendency for normal people when confronted with a continuous series of positive “stats” is to become comfortable with good news and not be sensitive to the possibility of failure. • “Normal people” routinely experience failure by believing their own press (or statistics).
NASA & ColumbiaJan 16, 2003 • CAIB: “The unexpected became the expected, which became the accepted.” • When NASA lost 7 astronauts, the organization's TRC rate was 600% better than the DOE complex. • And yet, on launch day • 3,233 Criticality 1/1R* hazards had been waived. * Criticality 1/1R component failures result in loss of the orbiter and crew.
Davis-Besse2002 • Had some performance “hard spots” in the 80's • Had become a world-class performer in the next 15 years • Preceding initiating events of mid 90's • Frequently benchmarked by other organizations • While a serious corrosion event was taking place • Complete core melt near miss in 2002
SYSTEM ACCIDENT TIMELINE What is Next? Who is Next? 1979 - Three Mile Island 1984 – Bhopal India 1986 – NASA Challenger 1986 – Chernobyl 1989 – Exxon Valdez 1996 – Millstone 2001 – World Trade Center 2005 – BP Texas City 2007 – Air Force B-52 2008 – Stock Market Crash
How do Organizations Get Themselves into System Accident Space? Attempts to Understand & Prevent System Accidents
Attempts to Understand & Prevent System Accidents (High Reliability vs. Normal Accident Theory)
The Cure for Organization Blindness "Most ailing organizations have developed a functional blindness to their own defects. They are not suffering because they cannot resolve their problems, but because they cannot see their problems.“ John Gardner Weak Signals
What is the Focus of an HRO? Individual Accidents OR Systems Accidents?
Individual Accident • An accident occurs wherein the worker is not protected from the plant and is injured (e.g. radiation exposure, trips, slips, falls, industrial accident, etc.) • Focus: • Protect the worker from the plant Plant (hazard) Human Errors (receptor)
Systems Accident • An accident wherein the system fails allowing a threat (human errors) to release hazard and as a result many people are adversely affected • Workers, Enterprise, Surrounding Community, Country Plant (hazard) Human Errors (threat) • Focus: • Protect the plant from the worker The emphasis on the system accident in no way degrades the importance of individual safety , it is a pre-requisite of an HRO
System Accident vs. System Event • System accident - an occurrence that is unplanned and unforeseen that results in serious consequences and causes total system disruption (i.e. death, dose, dollars, delays etc.). • System event - any unplanned, unforeseen occurrence that results in the failure of the system that does not result in catastrophic consequences -- indicates a breakdown in the system vital to the well-being of many people and the survivability of the organization! • System accident - an occurrence that is unplanned and unforeseen that results in serious consequences and causes total system disruption (i.e. death, dose, dollars, delays etc.). • System event - any unplanned, unforeseen occurrence that results in the failure of the system that does not result in catastrophic consequences -- indicates a breakdown in the system vital to the well-being of many people and the survivability of the organization!
Why Is Being an HRO So Important? Some types of system failures are so punishing that they must be avoided at almost any cost. These classes of events are seen as so harmful that they disable the organization, radically limiting its capacity to pursue its goal, and could lead to its own destruction. Laporte and Consolini, 1991 Some types of system failures are so punishing that they must be avoided at almost any cost. These classes of events are seen as so harmful that they disable the organization, radically limiting its capacity to pursue its goal, and could lead to its own destruction. Laporte and Consolini, 1991 Some types of system failures are so punishing that they must be avoided at almost any cost. These classes of events are seen as so harmful that they disable the organization, radically limiting its capacity to pursue its goal, and could lead to its own destruction. Laporte and Consolini, 1991
Comparing & Contrasting High Reliability Theory with Normal Accident Theory HROs vs. NAT Organizations
High Risk or High Consequence? R = C x P Risk=Consequence x Probability If we are truly working with high-risk operations, ethically and morally we should notbe in business!
High Reliability Organization (HRO) vs. Normal Accident Theory (NAT) Belief of HRO Accidents can be avoided by organizational design and management i.e. Risk = C x P is manageable Dr. Karlene Roberts Belief of NAT Accidents are inevitable in complex and tightly coupled operations i.e. Risk = C x P is too high Dr. Charles Perrow
High Reliability Organization (HRO) vs. Normal Accident Theory (NAT) Belief of HRO Accidents can be avoided by organizational design and management i.e. Risk = C x P is manageable Belief of NAT Accidents are inevitable in complex and tightly coupled operations i.e. Risk = C x P is too high Control of Risk DOE reduces “C” by: • minimizing the hazard and/or • mitigating the consequence DOE reduces “P” - human performance improvement • human performance error precursors • barriers
Complex vs. Linear Interactions Linear interactions • Expected & familiar production or maintenance sequences • Visible, even if unplanned • Simple -- readily comprehensible Complex interactions • One component can react with others outside normal production sequence • Nonlinear • Unfamiliar sequences, or unplanned and unexpected sequences not visible nor immediately comprehensible
Complex vs. Linear Interactions Linear interactions • Expected & familiar production or maintenance sequences • Visible, even if unplanned • Simple -- readily comprehensible Complex interactions • One component can react with others outside normal production sequence • Nonlinear • Unfamiliar sequences, or unplanned and unexpected sequences not visible nor immediately comprehensible
Complex Interactions 3 9 1 8 7 9 1 2 4 1 8 1 7 5 5 5 1 3 3 3 3 2 6 10 11 12 4 Complex vs. Linear Interactions Linear Interactions For linear interactions 4 events lead to 4 interactions. Complex Interactions For complex interactions, 4 events lead to 12 possible interactions. Greatly amplifies difficulty in determining and responding to the problem. 2 2 2 6 6 4 4
Complex Interactions Alone Insufficient • Complex interactions not necessarily high-risk systems with catastrophic potential, examples: • Universities • R&D • Federal Government • Also takes another key ingredient • Tight coupling
Loose vs. Tight Coupling Super Glue Ever done this? Loosely coupled systems: • Delays possible; processes can remain in standby mode • Spur-of-the-moment redundancies and substitutions can be found • Fortuitous recovery aids possible • Failures can be patched more easily, temporary rig can be set up Tightly coupled systems: • Time-dependent processes: they can’t wait or standby • Reactions in chemical plants – instantaneous, can’t be delayed or extended • Sequences invariant • Only one way to reach production goal • Little slack; quantities must be precise, resources can’t be substituted • Buffers and redundancies must be designed in, thought of in advance
Example of Loose vs. Tight Coupling Loose Coupling Tight Coupling
Interaction/Coupling Chart (Adapted from Fig 9.1 Perrow) HROs must neutralize bad effects here Linear < ----------------------Interactions ------------------------- > Complex Linear< ----------------------Interactions ------------------------- > Complex Linear< ----------------------Interactions ------------------------- > Complex More likely to have system accident More likely to have system accident Dams Dams Nuclear Power Nuclear Power Power Grids Power Grids DNA DNA Aircraft Aircraft (Increases escalation to full-blown event) (Increases escalation to full-blown event) (Increases escalation to full-blown event) Nuclear Weapon Accidents Nuclear Weapon Accidents Marine transport Marine transport Chem Plant Chem Plant Airways Airways Rail transport Rail transport Your Organization? Space Space Military Early warning Military Early warning 1 1 1 1 2 2 2 2 Loose<-------------Coupling ---------Tight Loose <-------------Coupling ---------Tight Loose <-------------Coupling ---------Tight 3 3 3 3 4 4 4 4 More time to act<------Coupling ----Less time to act More time to act<------Coupling ----Less time to act More time to act<------Coupling ----Less time to act Junior College Junior College Military Military Assembly line production Assembly line production Your Organization? Mining Mining R&D Firms R&D Firms Trade Schools Trade Schools Most Mfg. Plants Most Mfg. Plants Multi goal agencies DOE, OMB Multi goal agencies DOE, OMB (Can put in standby mode) (Can put in standby mode) (Can put in standby mode) Single goal agencies Motor vehicle, post office Single goal agencies Motor vehicle, post office Universities Universities Less likely to have system accident Less likely to have system accident Visible Interactions< --------------------Interactions --------------------- > Hidden Interactions Visible Interactions< --------------------Interactions --------------------- > Hidden Interactions Visible Interactions< --------------------Interactions --------------------- > Hidden Interactions (Decreases the probability of dangerous incident) (Decreases the probability of dangerous incident) (Decreases the probability of dangerous incident) (Increases the probability of dangerous incident) (Increases the probability of dangerous incident) (Increases the probability of dangerous incident) Normal Accidents –Living with High-Risk Technologies, Perrow Normal Accidents –Living with High-Risk Technologies, Perrow
Effect of Complex vs. Linear Interactions According to Perrow: • Interactiveness increases the magnitude of accident because of unrecognized connections between systems • Tight coupling increases the probability of initiating the accident Together they are the makings of a normal accident
Interactive Complexity & Coupling Lets Discuss • Provide examples • In your own organization • In other organizations • In other places throughout the world Where is today’s world going with regards complexity and coupling? Is this good or bad?
1 Better technologies 2 Better organizational processes High Reliability or Accident Waiting to Happen? HRO Accidents can be avoided by organizational designandmanagement NAT Accidents are inevitable in tightly coupledandcomplexoperations. Dr. Charles Perrow Perrow states that there are two primary ways that organizations try to counter interactive complexity (NAT)
High Reliability or Accident Waiting to Happen? HRO Accidents can be avoided by organizational design and management HROs use the rational-closed system construct to accomplish their goal by: • Maintaining safety as a leadership objective • Using redundant systems • Focusing on three operational and management factors • decentralization, • culture, and • continuity • Being a learning organization Dr. Scott Sagan The Limits of Safety, Scott Sagan
High Reliability or Accident Waiting to Happen? HRO Accidents can be avoided by organizational designandmanagement Without leadership, safety is but a facade. Multiple & Independent Barriers HROs use the rational-closed system construct to accomplish their goal by: HROs use the rational-closed system construct to accomplish their goal by: • Maintaining safety as a leadership objective • Using redundant systems • Focusing on three operational and management factors • decentralization, • culture, and • continuity • Being a learning organization Workers have to call the shots. Leaders want workers to call the shots as they would. Want workers to call the shots based on experience – keep the plant open. Learn from small mistakes – information-rich events! The Limits of Safety, Scott Sagan
High Reliability or Accident Waiting to Happen? NAT Accidents are inevitable in tightly coupledandcomplexoperations. Signs of Natural Actors No clear consistent goals Mgt at different levels have conflicting goals Unclear technology – organizations don’t understand their own processes NATs believe the natural-open organizational system prevails because: NATs believe the natural-open organizational system prevails because: • Conflicting leadership objective prevail • There are perils in redundant systems • There is no effective management of • decentralization, • culture, or • continuity • Organizational learning is restricted Signs of Open Organizations Decision-makers come and go Some pay attention, others don’t Key meetings dominated by biased, uninformed, uninterested personnel The Limits of Safety, Scott Sagan
High Reliability or Accident Waiting to Happen? Pressure to maintain production only slightly modified by increased interests in safety. NAT Accidents are inevitable in tightly coupledandcomplexoperations. Redundant barriers, not independent. Redundancy makes system opaque. Redundancy falsely makes system appear more safe. NATs believe the natural-open organizational system prevails because: NATs believe the natural-open organizational system prevails because: • Conflicting leadership objective prevail • There are perils in redundant systems • There is no effective management of • decentralization, • culture, or • continuity • Organizational learning is restricted Not enough time to improvise – tightly coupled. Leaders don’t know enough about their operations to evaluate whether workers are responding correctly or not. Causes of accidents and near-misses unclear –hard to learn. Incentives to fabricate positive records abound. The Limits of Safety, Scott Sagan
High Reliability or Accident Waiting to Happen? Attributes of HROs and NATs HRO Accidents can be avoided by organizational design and management NAT Accidents are inevitable in tightly coupledandcomplexoperations. HROs use the rational-closed system construct to accomplish their goal by: HROs use therational-closedsystem construct to accomplish their goal by: NATs believe the natural-open organizational system prevails because: NATs believe the natural-open organizational system prevails because: • Maintaining safety as a leadership objective • Using redundant systems • Focusing on three operational and management factors • decentralization, • culture, and • continuity • Being a learning organization • Conflicting leadership objective prevail • There are perils in redundant systems • There is no effective management of • decentralization, • culture, or • continuity • Organizational learning is restricted Good Bad The Limits of Safety, Scott Sagan
High Reliability or Accident Waiting to Happen? Attributes of HROs and NATs HRO Accidents can be avoided by organizational design and management NAT Accidents are inevitable in tightly coupledandcomplexoperations. These are attributes of HROs and NATs. The literature is silent on how they are achieved or avoided. HROs use the rational-closed system construct to accomplish their goal by: HROs use therational-closedsystem construct to accomplish their goal by: NATs believe the natural-open organizational system prevails because: NATs believe the natural-open organizational system prevails because: • Maintaining safety as a leadership objective • Using redundant systems • Focusing on three operational and management factors • decentralization, • culture, and • continuity • Being a learning organization • Conflicting leadership objective prevail • There are perils in redundant systems • There is no effective management of • decentralization, • culture, or • continuity • Organizational learning is restricted Good Bad The Limits of Safety, Scott Sagan
Don’t Dismiss the Normal Accident • NAT theorists take their case to the extreme but don’t dismiss their warnings • “Normal” implies it is likely to happen with “normal” organizations • It is “normal” to be human • We all relax if things go right • We all know we cut corners when we get busy • We all do what we have done before • We do what we see others do • The lack of negative response reinforces our belief that perhaps the rules were too strenuous so we start cutting more corners more often • Those dealing with high consequence operations never have the luxury of being “normal”
Fundamentals of Systems Approach Reality Engineering Understanding Socio-Technical Systems to Improve Bottom-Line
Central Theme of an HRO Not a New Initiative Logical, Defensible Way to Think Based on Logic & Science Logic & Science are Time and New Initiative Invariant The most important thing, is to keep the most important thing, the most important thing. Steven Covey, 8thHabit • Focus on what is important • Measure what is important
HROs Think and Act Differently • Take a physics-based system approach • Measure gaps relative to physics-based system • Explicitly account for people • People are not the problem, they are the solution • People are not robots, pounding won’t improve performance • People provide safety, quality, security, science etc. • Sustain behavior – account for culture • Improve long-term safety, security, quality
Spectrum of Safety Spectrum of Safety • Hard Core Safety Physics • Physics invariant • Prevent flow of unwanted energy • Delta function • Squishy People Part of Safety • Average IQ of the organization • It is a systems approach • Gaussian curve As Engineers Write As People Do
Spectrum of Safety Spectrum of Safety • Hard Core Safety Physics • Physics invariant • Prevent flow of unwanted energy • Delta function • Squishy People Part of Safety • Average IQ of the organization • It is a systems approach • Gaussian curve Old Mind-Set Compliance-based safety • High Reliability Organization • Explicitly consider human error • Take into account org. culture • Maximize delivery of procedures • Improve system safety
Construct of an HRO A Systems Approach to Safety
Remember if it is the System Accident Human Errors (threat) Plant (hazard) If a systems approach is required to ensure we don’t have to rely on every individual having a perfect day every day to avoid the catastrophic accident, then we had better take the best approach to implementing that system!
Construct of the HROSystems Approach to Avoid Catastrophic Accidents We used Deming’s Theory of Profound Knowledge to develop a process to attain those HRO attributes identified by the High Reliability Theorists Deming’s Theory of Profound Knowledge (TPK) provides a foundation for the systems approach W. Edwards Deming
Construct of the HROSystems Approach to Avoid Catastrophic Accidents • Organizations are systems that interact within their internal and external environments • Statistical process control is the foundation of process optimization • Knowledge of Systems • Knowledge of Variation Deming’s Theory of Profound Knowledge (TPK) used to provide foundation for the systems approach • Knowledge of Psychology • Knowledge of Knowledge • Theory, prediction, and feedback as the basis of learning • Organizations have cultures that influence the system and desired outcome
Fundamental HRO Practices HRO Practices Cross-Walked to Deming TPK • Knowledge of Systems • Knowledge of Variation • Knowledge of Psychology • Knowledge of Knowledge