620 likes | 841 Views
Risk management tools. Patrick Hudson Tim Hudson Hudson Global Consulting. How can we manage risk?. We can manage risk by hoping it won’t happen We can manage risk by offering sacrifices to the Gods We can manage risk by understanding what we are doing The first two don’t work
E N D
Risk management tools Patrick Hudson Tim Hudson Hudson Global Consulting
How can we manage risk? • We can manage risk by hoping it won’t happen • We can manage risk by offering sacrifices to the Gods • We can manage risk by understanding what we are doing • The first two don’t work • The third is what a Safety Management System does
Risk • Risk is a complex concept • Combination of to different components • RISK = Outcome x Probability of that outcome • Outcomes – what could happen • Usually seen as a scenario • Worst case - conservative • Most credible worst case • Probability of those outcomes • Often measured as frequency of occurrence • Needs to be applied before anything has gone wrong • Probabilities are difficult to estimate • Knowing the probability may change its value
There is more to an SMS than lots of good intentions No Structure Structure TRIPOD HSE Policy Organization Structure safety management system Alcohol & Drugs Policy Road Safety Plan UnsafeActAudit Plan Feedback Audit Plans HAZARDS & EFFECTS MGMT. Continuous Improvement Engage Objectives Targets IncidentPotentialMatrix HSE Plan Check Health Risk Assess. Do EA
Safety Management System (SMS) DISASTER Production Better defenses converted to increased production BANKRUPTCY Protection
Safety Management System (SMS) DISASTER Best practice operations under SMS Production BANKRUPTCY Protection
PLAN DO CHECK Generic HSE Management System (Shell) 1- Leadership and Commitment 2 - Policy and Strategic Objectives 3 - Organisation, Responsibilities Resources and Standards 4 - Hazards & Effects Mgt (Risk Mgt) FEEDBACK 5 - Planning & Procedures 6 – Implementation, Monitoring Corrective Action Corrective Action 7 - Audit 8 - Management Review Corrective Action
Hazard-based approach • HEMP - Hazard and Effects Management Process • Identify - What are the hazards? • Assess - how big are those hazards? • Control - how do we control the hazards? • Recover - what if it still goes wrong?
Step 1. Identification • First identify your hazards • What is going to hurt you? • Needs to be specific enough to manage practically • E.g. not just potential and kinetic energy • General enough to manage specifics in the same way • Accumulate in a list – Hazard Register • A range of tools and methods help here • Brainstorming - proactive • HAZID • Incident analyses - reactive • Reporting
Step 2. Assess • How big is the risk you are taking and running? • A wide range of tools available • Not an exact science – whatever anyone tells you • Small risks can be ignored • Large risks may not be taken • Usually framed in terms of ALARP • As Low as Reasonably Practicable • Not intended to be as low as possible • Risk assessment should point to what to do about the hazard in question
Step 3. Manage and control • Primarily preventative • Success is measured by nothing going wrong • Prevention involves a variety of approaches • Use of the hierarchy of controls • Barriers to keep hazards in place • Controls to prevent them escaping • Management is directly responsibility for the provision of controls and barriers • Requires resourcing, procurement and continuous evaluation • Front line personnel is responsible for their use once provided and supported • Requires ability to operate the controls and barriers
Step 4. Recovery • Recovery is necessary after control over a hazardous process has been lost • But before the worst case consequences have been achieved • Recovery controls and barriers are reactive • The term Mitigation applies best here • These controls are usually much more expensive than preventative controls • Sometimes challenged because “We’ve never used that so we can get rid of it and save money”
Tools • Risk management tools are intended to help one or more of the 4 steps • Usually applied continuously to improve • Especially on the feedback loops • Audits • Incident investigations • Reporting • Performance assessment for predictive improvement • Identify – discover unexpected hazards • Assess – evaluate what needs to be done • Control – systematically list the controls to see if they are adequate to reduce the risk to acceptable levels • Recover – identify what will reduce the consequences • Successful risk management allows us to take the risks that enable us to get the benefits without disaster • These can easily be mapped onto the ICAO components • Not just the risk management elements • Also all the other elements
Minimising RegretMaximising Opportunity Regret No Regret Normal Operations Go Incident No-Go Missed Opportunity Safe
Risk Assessment Matrices • A simple way of supporting the product of outcome and probability • Not a discrete set of values, but an easy way of representing the distributions of severity of outcomes and their probabilities • So – there is no single CORRECT Matrix
Risk Assessment Matrix The colour determines the level of active risk management required
After Now Mitigation Right side Reduced exposure Left side Risk Calculations 0 1 2 3 4 5 6 8 10 11 7 9 12 13 14
Mitigation Right side Reduced exposure Left side Risk matrix alternative 0 2 2 4 4 5 8 12 15 28 8 20 40 100 200 The numbers are a reflection of how unacceptable the matrix cell is
120 100 Risk to stakeholders 80 Risk Cost 60 40 Legal mimimum requirements 20 0 6 1 2 3 4 5 What is ALARP? ALARP = As Low As Reasonably Practical Options
How can we understand our controls? • The Bowtie is an industry standard in many high-hazard activities • Bowties cover both control and recovery • Bowties are not primarily intended to be quantitative, but can be computed with • Bowties visually express the extent and types of control and are easy for managers to understand • Is everything procedural • Does one person have to do everything
Bow-tie Concept Events and Circumstances Harm to people and damage to assets or environment CONTROLS HAZARD CONSEQUENCES Undesirable event with potential for harm or damage Engineering activities Maintenance activities Operations activities
Bow-tie Conceptfor a specific event Events and Circumstances Harm to people and damage to assets or environment RISK CONTROLS HAZARD CONSEQUENCES Undesirable event with potential for harm or damage Engineering activities Maintenance activities Operations activities
A problem for aviation • Simple models have difficulty in capturing recent major commercial aviation incidents • Asiana 214, QF 32, AF 447, BA 38
A Diversion - Causality • Simple accidents are simply caused • Linear and deterministic • Complex accidents are more complex • 80-20 rule suggests simple accidents are 80% • Remaining 20% require us to recognize complexity
Theory 1 - how accidents are caused • Linear causes – A causes B causes C • Deterministic - either it is a cause or it isn’t • We can compute both backwards and forwards • People are seen as the problem – human error etc • Probably good enough to catch 80% of the accidents we are likely to have • Covers most of private and GA operators
Theory 2 - how accidents are caused • Non-Linear causes • Cause and consequence may be disproportionate • These causes are organizational, not individual • Deterministic dynamics- either it is a cause or it isn’t • We can compute both backwards and forwards • Increasingly difficult with non-linear causes • This is the Organizational Accident Model • Probably good enough to catch 80% of the residual accidents = 96% • Probably best GA and professional operations
Non-linearity • The size of an effect (consequence) is linearly proportional to the input – linearity • Non-linearity is different • The size of an effect (bad consequences) gets bigger (or smaller after a while) as a function of the input • The improvement in performance gets smaller (almost always) even though the input gets bigger • Linearity works fine to start with, but only 80% of the cases
Linear and non-linear functions Linear Non-linear Effect Effect Cause Cause Suddenly gets a lot worse
More non-linear functions Non-linear Non-linear Effect Effect Cause Cause It can’t get much worse Both – starts bad, tails off
Determinism • A Causes B • If A happens, then B will happen next
Non-determinism • Move from A causes B to A makes B more likely • Causation is probabilistic • Probabilities are distributions, not points
Conditionalize on latest aircraft generation 4 th generation aircraft have dominantly weird accidents
Types of accidents • Theory 1 • Simple models may cover 80% of all accidents • These are the simple personal accidents • Theory 2 • The next step gets 80% of the remainder = 96% • These are the complex personal accidents and some organizational accidents • Theory 3 • The probabilistic approach may net the next 80% = 99.2% • These are the complex process accidents
Theory 3 - how accidents are caused • Non-Linear causes • Non-Deterministic dynamics • Probabilistic rather than specific • Influences on outcomes by people and the organisation • Probabilities may be distributions rather that single values • We cannot compute both backwards and forwards • The dominant accidents that remain are WEIRD • WILDLY • ERRATIC • INCIDENTS • RESULTING IN • DISASTER • Prior to an event there may be a multitude of possible future outcomes
Unusual or WEIRD Accidents • In commercial aviation major accidents are now extremely rare • Simple risk assessment and analysis models often fail to capture how these accidents are caused • We need to understand our risk space better • The Rule of Three is an example of how to do this
The Rule of Three • Accidents have many causes (50+) • A number of dimensions were marginal • Marginal conditions score as Orange • NO-Go conditions score as Red • The Rule of 3 is Three Oranges = Red
Aircraft Operation Dimensions • Crew Factors Experience, Duty time, CRM • Aircraft Perf. Category, Aids, Fuel, ADDs • Weather Cloud base, wind, density alt, icing, wind • Airfield Nav Aids, ATC, Dimensions, Topography • Environment Night/day, Traffic, en route situation • Plan Change, Adequacy, Pressures, Timing • Platform Design, Stability, Management
The Rule of Three Crash Big Sky Outcome We fixed it Problem No problem 1/2 1 1/2 2 1/2 3 1/2 No of Oranges
Why does the rule work? • People use cognitive capacity to allow for increasing risk • As the oranges increase the remaining available capacity is reduced • At 3 oranges there is little available capacity remaining • Any trigger can de-stabilize the system • An accident suddenly becomes very likely
How random numbers combine Load > strength Normal upper limit Normal lower limit
Risk • Risk is a complex concept • Classically probability x outcome • Safety management is about: • Taking risk – acceptable (ALOS) vs unacceptable • Running risk – getting away with it • Can be based on luck or on professionalism • The granularity of the outcomes and how they can be reached is essential • Most approaches are crude • Salami slicing is a way to evade regulation
Risk Space High Risk areas Low risk/resilient areas
Single distribution A Known danger zone
Single distribution B Known danger zone
Single distribution C Known danger zones Known danger zone