270 likes | 458 Views
How safe is safe enough? (and how do we demonstrate that?). Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University of York. Why System Safety ?. Why do we strive to make systems safe? Self interest
E N D
How safe is safe enough?(and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University of York
Why System Safety ? Why do we strive to make systems safe? • Self interest • we wouldn’t want to be harmed by systems we develop and use • unsafe systems are bad business • We have to do so • required by law • required by standards • But what do the law and standards represent? • laws try to prevent what society finds morally unacceptable • ultimately assessed by the courts, as representatives of society • standards try to define what is acceptable practise • to discharge legal and moral responsibilities
Perception of Safety • Perception (and hence individual acceptance) of risk affected by many factors • (Apparent) degree of control • Number of deaths in one accident (aircraft versus cars) • Familiarity vs. novelty • “Dreadness” of risk (“falling out of the sky”, nuclear radiation) • Voluntary vs. involuntary risk (hang gliding vs nuclear accident) • Politics and journalism • Frequency / profile of reporting of accidents / issues • Experience • Individual factors – age, sex, religion, culture • How do companies (engineers?) make decisions given diversity of views?
Getting it wrong 1: Boeing 777 An incident of massive altitude fluctuations on a flight out of Perth Problem caused by Air Data Inertial Reference Unit (ADIRU) Software contained a latent fault which was revealed by a change Problem was in fault management/dispatch logic June 2001 accelerometer #5 fails with erroneous high output values, ADIRU discards output values Power Cycle on ADIRU occurs each occasion aircraft electrical system is restarted Aug 2006 accelerometer #6 fails, latent software error allows use of previously failed accel #5 http://www.atsb.gov.au/publications/investigation_reports/2005/AAIR/aair200503722.aspx
Getting it wrong 2: Therac25 Therac 25 was a development of (safe, successful) earlier medical machines Intended for operation on tumours Uses linear accelerator to produce electron stream and generate X-rays (both can be used in treatments) X-ray therapy requires about 100 times more electron energy than electron therapy this level of electron energy is hazardous if patient exposed directly Selection of treatment type controlled by a turntable
Software in Therac-25 On older models, there were mechanical interlocks on turntable position and beam intensity In Therac-25, mechanical interlocks were removed; turntable position and beam activation were both computer controlled Older models required operator to enter data twice - at patient’s side, in shielded area – and then cross-checked In Therac-25, data only entered once (to speed up therapy sessions) Very poor user interface Display updated so slowly experienced therapists could “type ahead” Undocumented error codes which occurred so often the operators ignored them Six over-dosage accidents (resulting in deaths) May have been many cases where ineffective treatment was given
Safety Life Cycle 1 Simple “V” model of development lifecycle
Safety Life Cycle 2 Major safety activities during development: • Hazard Identification and Requirements Setting • identifying potential accidents and associated hazards • assessing risk OUTPUT: derived safety requirements to avoid / minimise hazards • Driving Design • examining design proposals • identifying causes of hazards, potential weaknesses, and risks OUTPUT: new derived safety requirements to improve design • preliminary assessment that design proposals can meet targets OUTPUT: evidence to justify design decisions • Producing Safety Evidence • confirming that design meets requirements OUTPUT: evidence of achieved safety
Safety Life Cycle 3 Integrated Development and Safety Processes
Safety Cases: Who are they for? • Many people and organisations will have an interest in a safety case • supplier / manufacturer • operator • regulatory authorities • bodies that conduct acceptance trials • people who will work with the system • and their representatives (unions) • “neighbours” (e.g. general public who live round an air base) • emergency services • May need more than one “presentation” of safety case to suit different audiences • Who has the greatest interest?
Goal Structuring Notation Purpose of a Goal Structure To show how goals are broken down into sub-goals, and eventually supported by evidence (solutions) whilst making clear the strategies adopted, the rationale for the approach (assumptions, justifications) and the context in which goals are stated A/J
Traditional flight controls • Rods and links • Power assistance from high pressure hydraulics
with this… HEAT project replaces this…
…with this. …and this…
System HEAT/ACT system is acceptably safe Integration Trials a/c remains acceptably safe with HEAT fitted SMS SMS implemented to DS00-56 Clearance Procedures for flight clearance and certification followed Product All identified hazards have been suitably addressed Process All relevant requirements and standards have been complied with HEAT: Developing the Argument Top goal Trials aircraft is acceptably safe to fly with HEAT/ACT fitted
An analogy • Safety case like a legal case presented in court • Like a legal case, a safety case must: • be clear • be credible • be compelling • make best use of available evidence • Like a legal case, a safety case will always be subjective • There is no such thing as absolute safety • Safety can never be proved • Always making an argument of acceptability
What is a convincing argument? • Example: The Completeness Problem
Think about evidence in used in legal (court) case Direct - Supports a conclusion with no “intermediate steps” e.g. a witness testifies that he saw the suspect at point X at time Y. Circumstantial - Requires an inference to be made to reach a conclusion e.g. ballistics test proves the suspect’s gun fired the fatal shot. Safety case evidence is similar e.g. Testing is direct – shows how the system behaves in specific instance Conformance to design rules is indirect – allows inference that system is fit for purpose (if rules have been proven) Evidence may “stack up” in different ways: How is evidence used?
Conclusions • Demonstrating safety is a challenge • We are building ever more complex systems • Much of the “bespoke” complexity is in software • Essential that safety is a design driver... • ... and also, design for ability to demonstrate safety