1 / 96

Safety Critical Systems

Safety Critical Systems. Eight steps to safety. Identify the hazards Determine the risks Define the safety measures Create safe requirements Create safe designs Implement safety Assure the safety process Test, test, test. Eight steps to safety. Identify the hazards Determine the risks

Download Presentation

Safety Critical Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Safety Critical Systems

  2. Eight steps to safety • Identify the hazards • Determine the risks • Define the safety measures • Create safe requirements • Create safe designs • Implement safety • Assure the safety process • Test, test, test

  3. Eight steps to safety • Identify the hazards • Determine the risks • Define the safety measures • Create safe requirements • Create safe designs • Implement safety • Assure the safety process • Test, test, test Safety analysis Handled at the architectural level and mechanistic level

  4. Safety Analysis • You must identify the hazards of the system • You must identify the faults that can lead to hazards • You must define safety control measures to handle hazards • These culminate in the Hazard Analysis • The Hazard Analysis feeds into the Requirements Specification

  5. Eight steps to safety • Identify the hazards • Determine the risks • Define the safety measures • Create safe requirements • Create safe designs • Implement safety • Assure the safety process • Test, test, test

  6. Hazard Causes • Release of energy • electromagnetism (microwave oven) • radiation (nuclear power plant) • electricity (electrocution hazard from ECG leads) • heat (infant warmer) • kinetic (runaway train) • Release of toxins

  7. Hazard Causes • Interference with life support or other safety-related function • Misleading safety personnel • Failure to alarm • alarming too much - Therac 25. These were ignored and people were killed

  8. Types of Hazards • Actions • inappropriate system actions taken • F-18 pilot pulling up landing gear • appropriate system actions not taken • Timing • too soon • too late • fault latency time

  9. Types of Hazards • Sequence • skipping actions • actions out of order • Amount • too much • too little

  10. Example Hazards • Actions • incorrectly energizing a medical treatment laser • failure to engage landing gear • Timing • cardiac pacemaker paces too fast • flight control surface adjusted too slowly

  11. Example Hazards • Sequence • empty the vat, THEN add the reagent • out of sequence network packets controlling industrial robot • Amount • electrocution from muscle stimulator • too little oxygen delivered to ventilator patient

  12. Means of Hazard Control • Obviation; the possibility of the hazard can be removed by being made physically impossible • use incompatible fasteners to prevent cross connections • Education; the hazard can be handled by educating the users so that they won’t create hazardous conditions through equipment misuse • don’t look down the barrel when cleaning your rifle

  13. Means of Hazard Control • Alarming; announcing the hazard to the user when it appears so that they can take appropriate action • alarming when the heart stops beating • Interlocks; the hazard can be removed by using secondary devices and/or logic to intercede when a hazard presents itself • car won’t start unless it is in “Park”

  14. Means of Hazard Control • Internal checking; the hazard can be handled by ensuring that a system can detect that it is malfunctioning prior to an incident • CRC checks data for corruption whenever it is accessed • Safety equipment • goggles, gloves

  15. Means of Hazard Control • Restricting access to potential hazards so that only knowledgeable users have such access • using passwords to prevent inadvertently starting service mode • Labelling • “High Voltage -- DO NOT TOUCH”

  16. Hazard Analysis What do you do about it? How long is the exposure to hazard? How can this happen? How long to discover? How long can it be tolerated How bad if it occurs? Hazardous condition How frequently? Hazard Level of Toleran Fault Likeli Detection Control Exposure risk ce time hood time Measure time T1 Hypo- Severe 5 min Ventilator rare 30 sec Independent 1 min ventilation fans pressure alarm, action by doctor Esphageal often 30 sec C)2 sensor 1 min Intubation alarm User often 0 Noncompati 0 misattaches ble breathing mechanical circuit fasteners used Overpressur Severe 250 ms Release rare 50 ms Secondary 55 ms e valve valve opens failure

  17. When is a system safe enough? • (Minimal) No hazards in the absence of faults • (Minimal) No hazards in the presence of any single point failure • a common mode failure is a single point failure that affects multiple channels • a latent fault is an undetected fault which allows another fault to cause a hazard • Your mileage may vary depending on the risk introduced by your system

  18. Safety Measures • You cannot depend on a safety measure that you cannot test! CAN bus with 2 nodes provides a CRC on messages checked at the chip level, but the chips provide no way of testing to see if it is working. Therefore, it cannot be relied on as a safety measure

  19. Fail-Safe States • Off • Emergency stop -- immediately cut power • Production stop -- stop after the current task • Protection stop -- shut down without removing power • Partial Shutdown • Degraded level of functionality

  20. Fail-Safe States • Hold • No functionality, but with safety actions taken • Manuel or External control • Restart (reboot)

  21. Eight steps to safety • Identify the hazards • Determine the risks • Define the safety measures • Create safe requirements • Create safe designs • Implement safety • Assure the safety process • Test, test, test

  22. Risk Assessment • For each hazard • determine the potential severity • determine the likelihood of the hazard • determine how long the user is exposed to the hazard • determine whether the risk can be removed

  23. TUV Risk Level Determination Chart W3 W2 W1 S1 1 - - G1 E1 2 1 - G2 S2 3 2 1 G1 E2 4 3 2 G2 5 4 3 E1 S3 6 5 4 E2 7 6 5 S4 8 7 6 Risk parameters S: Extent of damage S1: slight injury S2: severe irreversible injury, to one of more persons or the death of a single person S3: death of several persons S4: Catestrophic consequences, several deaths E: Exposure time E1: seldom to relatively infrequent E2: frequent to continuous G: Hazard Prevention G1: possible under certain conditions G2: hardly possible W: Occurrence probability of hazardous event W1: very low W2: low W3: relatively high

  24. Sample Risk Assessments Device Hazard Extent of Exposure Hazard Probability TUV Risk damage time Prevention level Microwave Irradiation S2 E2 G2 W3 5 oven Pacemaker Pace too S2 E2 G2 W3 5 slowly Pace too S2 E2 G2 W3 5 fast Power Explosion S3 E1 -- W3 6 station Airliner Crash S4 E2 G2 W2 8

  25. Eight steps to safety • Identify the hazards • Determine the risks • Define the safety measures • Create safe requirements • Create safe designs • Implement safety • Assure the safety process • Test, test, test

  26. Safety Measures • Safety measures do one of the following • remove the hazard • reduce the risk • identify the hazard to supervisory control • The purpose of the safety measure is to ensure the system remains in a safe state

  27. Safety Measures • Adequacy of measures • safety measures mut be able to reliably detect the fault • safety measures must be able to take appropriate actions Component Fault/Error Software class Examples of acceptable measures 1 2 Interrupt handling no interrupt or too rq functional test; or time-slot and execution frequent monitoring no interrupt or too rq comparison of redundant frequent and functional channles by either; interrupt related - reciprocal comparison to different - independent hardware sources comparator - independent time-slot and logical monitoring

  28. Risk Reduction • Identify the fault • Take corrective action, either • use redundancy to correct and move on • feedforward error correction (Hamming codes) • redo the computational step • feedback error detection (take corrective action first) • go to a fail-safe state

  29. Fault Identification at Run-Time • Faults must be identified in < TO • Fault identification requires redundancy • Redundancy can be in terms of • channel • device • data • control } Architectural } Detailed design

  30. Fault Identification at Run-Time • Redundancy may be either • homogenous (random faults only) • does not detect errors • peform functions the same way on the same thing multiple times • heterogenous (systematic and random faults) • includes errors -> present in all channels • perform processing differently and hopefully you didn’t make the same mistake!

  31. Fault Tree Analysis Symbology A condition that must be present to produce the output of a gate An event that results from a combination of events through a logic gate Transfer A basic fault event that requires no further development A fault event because the event is inconsequential or the necessary information is not available AND gate (also OR gate) An event that is expected to occur normally NOT gate

  32. Subset of Pacemaker Fault Analysis Pacing too slowly Condition or event to avoid Secondary conditions or events OR Shutdown fault Time-base fault Invalid pacing rate OR AND OR AND Crystal failure Watchdog failure Bad command rate Data corrupted in vivo Software failure CPU H/W failure Rate command corrupted CRC hardware failure Primary or fundamental faults

  33. Eight steps to safety • Identify the hazards • Determine the risks • Define the safety measures • Create safe requirements • Create safe designs • Implement safety • Assure the safety process • Test, test, test

  34. Safe Requirements • Requirements specification follows initial hazard analysis • Specific requirements should track back to hazard analysis • must be shown to FDA, etc • Architectural framework should be selected with safety needs in mind • has the hooks in place

  35. Eight steps to safety • Identify the hazards • Determine the risks • Define the safety measures • Create safe requirements • Create safe designs • Implement safety • Assure the safety process • Test, test, test

  36. Use Good Design Practices • Good design practices allow you to • manage complexity • view the system at various levels of abstraction • zoom in on a particular area of interest • identify hot spots of special concern • have consistent quality • easily test • build and use high quality components • Regulatory agencies look at this!!

  37. Use Good Design Practices • Manage your requirements • trace requirements to design elements • trace design elements back to requirements remote communications adjust trajectory class a class b remote communication requirements specification class c class d class e use cases design model

  38. Use Good Design Practices • Use iterative development • integrating many times finds more defects • iterative prototypes can result in more reliable and safe systems

  39. Use Good Design Practices • Use component-based design architectures • third party components may be very well tested in they are in wide use • require bug lists from component vendors • this bit Microsoft once

  40. Use Good Design Practices • Use Visual Modeling • UML • Ward-Mellor • Use executable models • animate models • execute and debug at modeling level of abstraction

  41. Use Good Design Practices • Use frameworks • a framework is a partially completed application which is specialized by the user • Microsoft foundation classes • Object Execution Framework • frameworks reduce the work of developing new applications • frameworks rely on well-tested patterns

  42. pattern pattern pattern pattern pattern Use Good Design Practices User Model Framework + 80-90% of application code is housekeeping code = System

  43. Use Good Design Practices • Use Configuration Management • only use unit-testing components in builds parameters data aquisition SYSTEM CM Database drivers OS

  44. Use Good Design Practices • Design for test • product testing • built-in-testing to ensure • invariants are truly invariant • functional invariants • quality of service invariants (e.g. performance) • faults are detected

  45. sub system system Good Design Practices • Isolate Safety Functions • Safety-relevant systems are 200-300% more effort to produce • Isolation of safety systems allows more expedient development • Care must be taken that the safety system is truly isolated so that a defect in the non-safety system cannot affect the safety system • different processor • different heavy-weight tasks (depends on the OS)

  46. Safety Critical Patterns

  47. Safety Architecture Patterns • Protected Single-Channel Pattern • Dual-Channel Pattern • Homogenous Dual Channel Pattern • Heterogenous Peer-Channel Pattern • Sanity Check Pattern • Actuator-Monitor Pattern • Voting Multichannel Pattern

  48. Protected Single Channel Pattern • Within the single channel, mechanisms exist to identify and handle faults • All faults must be detected within the fault tolerance time • May be impossible • to test for all faults within the fault tolerance time • to remove common mode failures from the single channel • Generally, less recurring system cost • no additional hardware required

  49. Protected Single Channel Pattern If I’m not getting life ticks, I’ll shut down! Single Channel Train Braking System

  50. Dual Channel Architecture Patterns • Separation of safety-relevant from non-safety relevant where possible • Separation of monitoring from control • Generally easier to meet safety requirements • timing • common mode failures • Generally higher recurring system cost • additional hardware required

More Related