810 likes | 1.34k Views
System Safety: A systematic processes. 1. Identify the Hazards. 6. Supervise and Review. 5. Implement Risk Controls. 3. Analyze Risk Control Measures. 4. Make Control Decisions. 2. Assess the Risks. Risk Assessment An evaluation of threats in terms of severity and probability.
E N D
1. Identify the Hazards 6. Supervise and Review 5. Implement Risk Controls 3. Analyze Risk Control Measures 4. Make Control Decisions 2. Assess the Risks Risk AssessmentAn evaluation of threats in terms of severity and probability
MISSION FOCUS(HAZARD VERSUS RISK) Identifying and analyzing an existing or potential condition that can impair mission accomplishment (No discussion of mission significance) HAZARD ID & Analysis A hazard for which we have estimated the severity, probability, and scope with which it can impact our mission and accepted it RISK Assessment & Mgmt
Hazard Identification and Analysis during the Life Cycle of a system
Complete Risk Assessment ID Hazardous Condition Threat assessment process Q/Q Assess Severity Q/Q Assess Probability
S E V E R I T Y THE RISK ASSESSMENT MATRIX Probability Likely Occasional Seldom Unlikely Frequent A B C D E Extremely I Catastrophic High High II Critical High Medium III Moderate Low IV Negligible Risk Levels
A thorough risk assessment process might help you better understand a hazard you have been exposed to many times before without incident* * No beavers were assaulted in production of this slide
Hazard Severity • What impact will this threat have on people? • What impact on environment, equipment or facilities? • What impact on mission?
Severity Categories • A key factor in establishing a common understanding of a safety programs goal • MIL-STD 882 uses four categories • Cat 1: Catastrophic • Cat 2: Critical • Cat 3: Marginal • Cat 4: Negligible
Severity Qualified CATASTROPHIC- Complete mission failure, death, or loss of system CRITICAL - Major mission degradation, severe injury, occupational illness, or major system damage MODERATE - Minor mission degradation, injury, minor occupational illness, or minor system damage NEGLIGIBLE - Less than minor mission degradation, injury, occupational illness or minor system damage
Severity Quantified CATASTROPHIC- Complete mission failure, death, or loss of system and/or costs exceeding $1B CRITICAL - Major mission degradation, severe injury, occupational illness, or major system damage and/ or costs exceeding $1M MODERATE - Minor mission degradation, injury, minor occupational illness, or minor system damage and/or costs exceeding $100,000 NEGLIGIBLE - Less than minor mission degradation, injury, occupational illness or minor system damage and/or costs exceeding $10,000
ProbabilityExpressed in terms of time, occurrence, proximity, etc • Use data to substantiate your assessment • Use descriptive or quantitative terms • Use the cumulative probability of all factors • Examine experientially derived or anecdotal information from operators • Acknowledge uncertainty – There are no guarantees
S E V E R I T Y THE RISK ASSESSMENT MATRIX Probability Likely Occasional Seldom Unlikely Frequent A B C D E Extremely I Catastrophic High High II Critical High Medium III Moderate Low IV Negligible Risk Levels
Qualified Probability Categories • FREQUENT • Individual piece of equipment - Occurs often in the life of the system • Individual - Occurs often in career • Fleet or inventory - Continuously experienced • All Personnel exposed - continuously experienced • LIKELY • Individual piece of equipment - Occurs several times in the life of the system • Individual - Occurs several times in a career • Fleet or Inventory - Occurs often • All Personnel exposed - Occurs often • OCCASIONAL • Individual piece of equipment - Will occur in the life of the system • Individual - Will occur in a career • Fleet or Inventory - Occurs several times in the life of the system • All Personnel exposed - Occurs sporadically
Qualified Probability (cont) • SELDOM • Individual piece of equipment - Could occur in the life of the system • Individual person - Could occur in a career • Fleet or Inventory - Can be expected to occur in the life of the system • All Personnel exposed - Seldom occurs • UNLIKELY • Individual piece of equipment - You assume it will not occur in the system lifecycle • Individual person - So unlikely you assume it will not occur in a career • Fleet or Inventory - Unlikely but could occur in the life of the system • All Personnel exposed - Occurs very rarely
Probabilities Quantified(In terms of failure or exposure rates) • Unlikely: 1 failure in 1,000,000,000 events instead of assuming it will not occur • Seldom: 1 failure in 500 million exposures instead of it could occur • Occasional: 1 failure in 1 million exposures instead of it will occur • Likely: 1 failure in 500,000 exposures instead of it occurs several times • Frequent:1 failure in 100,000 events instead of it occurs often
Qualitative AssessmentAC 25.1309-1A • Design Appraisal • Installation Appraisal • Failure Modes and Effects Analysis • Fault Tree Analysis • Probability Assessment
Quantitative AssessmentAC 25.1309-1A • Probability Analysis (PRA) • Quantitative Probability Terms (QRA)
FAA Fail-Safe Design ConceptAC 25.1309-1A • The fail-safe design concept considers the effects of failures and combinations of failures in defining a safe design • The following basic objectives apply: • In any system or subsystem, the failure of any single element, component, or connection during any one flight should be assumed. Such single failure should not prevent continued safe flight and landing • Subsequent failures during the same flight, whether detected or latent, should also be assumed unless their joint probability with the first failure is demonstrated to be extremely improbable
Fail-Safe Design Concept • Fail-Safe designs use the following design principals – A combination of two or more are usually needed to provide a fail-safe design • Redundant or backup systems • Isolation of systems, components and elements • Demonstrated reliability / Periodic inspection • Failure warning and indication • Flight crew procedures • Designed failure effect limits • Designed failure path • Increased margins or factors of safety • Error-tolerant design
Operational and Maintenance ConsiderationsAC 25.1309-1A • Flight crew action • Ground crew action • Certification check requirements • Flight with inoperative equipment
Quantifying or Qualifying Risk? Remember Murphy’s Law for Management: “Technology is dominated by those who manage what they don’t understand”
Risk Acceptance Codes • RAC 1 – Unacceptable • RAC 2 – Undesirable • RAC 3 – Acceptable with controls • RAC 4 - Acceptable
Risk Assessment Shortcomings • Deficiencies in RACs represent one of the major problems facing the system safety effort • Quantitative severity and probabilities scales in most RAC matrices are too subjective • The RAC is a main driver of system safety efforts • This code prioritizes the management emphasis given to a particular problem
S E V E R I T Y THE “ENHANCED” RISK ASSESSMENT MATRIX - Numeric Code is used to prioritize hazards and determine their acceptability using a quantitative methodology Probability Frequent Occasional Likely Seldom Unlikely C D E A B I Catastrophic II Critical III Moderate IV Negligible Risk Levels
THE RISK PRIORITY LIST Highest Risk By ranking the hazards, we address them on a “worst-first” basis Safety dedicated resources are always limited and should be directed at the highest risk Lowest Risk Warranting action
ASSESSMENT CHALLENGES • Over optimism • Over pessimism • Misrepresentation/Misunderstanding • Alarmism / “Accident du Jour” • Indiscrimination • Bias • Inaccuracy
Total Risk Exposure Codes • Expanded scale • Probability expressed in Exposure • Severity expressed in Cost • Combined determination expressed in quantifiable terms $$$$* (Now you are talking a language the bean counters understand)
Verification & Validation • Quality of data establishes process credibility • Avoid GIGO syndrome • Verify and Validate initial estimates with updated data • Failure rates • Exposure rates • Project lifecycle changes • Number of units in the system
THE PRIORITY LISTWhat does it accomplish? Traditional Risk Management - Personnel can’t name or prioritize hazards -- can only identify general threats ORM - Personnel can name and prioritize RISKS that impact them and their mission In a mature “NORMal” world, every individual personally benefits by adapting the knowledge of prioritized hazards that exist in their life -- (Due diligence is demonstrated when managers see that their subordinates possess this knowledge)
System Safety PrecedenceA systematic approach to Hazard ID – Risk Assess and Control • Design to minimize hazards • Robust & Redundant systems, assemblies, components, etc • Install physical barriers • Isolate known threatening conditions or environments • Use Warning devices • Alerts to prevent or reduce unwanted event • Develop Procedures and Training • Most commonly used & abused hazard control
1. Identify the Hazards 6. Supervise and Review 2. Assess the Risks 5. Implement Risk Controls 4. Make Control Decisions 3. Analyze Risk Control Measures Risk Analysis
Determine control effects Identify control options Prioritize risk control measures Assessing Risk Controls
2 Major Risk Control Approaches • Employ Macro Risk Control Option(s) • Reject – Avoid – Delay –Transfer –Spread – Compensate – Reduce • Implement System Safety Precedence Control Option(s) • Engineer – Guard – Improve Design – Limit Exposure – Personnel Selection – Train – Warn – Motivate – Reduce Effect - Rehabilitate
“Swiss Cheese” Model of Defenses Hazards The ideal The reality Potential losses (people and assets) James Reason: “Managing the Risks of Organizational Accidents”
“Swiss Cheese” Model of Defenses Some ‘holes’ due to active failures Defenses in depth Other ‘holes’ due to latent conditions James Reason: “Managing the Risks of Organizational Accidents”
Macro Options • REJECT • Risk outweighs benefit • AVOID • Go around the risk, do it in a different way • DELAY • Maybe the problem will be resolved by time • If delay is an acceptable option consider if operation is needed at all • TRANSFER • Better qualified system, i.e.,“Pro’s From Dover”
Macro Options (cont) • SPREAD • Modular or separate Hazardous Operations • COMPENSATE • Design parallel and redundant systems • REDUCE • Design for minimum risk • Incorporate Safety Devices • Provide Warning Devices • Develop SOPs & Train
The Risk Control Macro Option List • Reject • Avoid • Delay • Transfer • Spread • Compensate • Reduce • QUESTION: Why isn’t eliminate on this list?
Determine Risk Control Effects • How will this effect probability? • How will this effect severity? • How will this impact other sub-systems? • Some controls support other sub-systems • Some controls may hinder other sub-systems • What are the costs vs. benefits? • Direct Costs • Indirect Costs
Direct vs. Indirect Costs “As a rule of thumb, it is generally acceptable to calculate indirect costs of a mishap to be 7 times greater than those costs which can directly be accounted for in the incident or accident”
Risk Control ROT’s • Use the System Safety Precedence order • Choose the most mission supportive combinations • Use Integrated Product Teams • Look for synergistic enhancements • Man – Machine – Medium – Mission - Management
Use the 5 M model as you look for systemic issues • Man: • Doesn’t know • Doesn’t care • Can’t physically accomplish • Machine: • Poor design • Faulty maintenance • SOP’s
5 M systemic issues (cont) • Medium • Weak design considerations • Lack of provisions for natural “phenomena” • Management: • Inadequate procedures • Inadequate policy • Inadequate standards & controls • Mission: • Poorly thought out • Poorly executed • Weak understanding • Incompatibilities
Providing Management Risk Control Options • Program Manager looking for optimum combinations • Mission supportive • Some Risk Controls are incompatible • Evaluate full cost versus full benefit • Be prepared for numbers game • Some Controls reinforce one another • Win-Win option • Redundancy = Robustness • Is it needed? • Can you afford it? i.e., $$$, #’s, real estate
Aid to Decision Making • Be prepared to assist decisions at the right time • Don’t rush – Make them as late as possible without negative impact on timeline • Insure decisions are made at the right level • It should be establish who makes the tough calls • Use RAC or TREC to quantify who, what, when • Provide Mission supportive options • Use the Macro Option list as a starting point • Be prepared to offer sound advice
Don’t be one who says, “ …data or information was not available and our department could not prove it was unsafe to allow the operation.”