500 likes | 1.43k Views
Safety Critical Computer Systems - Open Questions and Approaches. Andreas Gerstinger Institute for Computer Technology February 16, 2007. Agenda. Safety-Critical Systems Project Partners Three research topics Safety Engineering Diversity Software Metrics Conclusion and Outlook.
E N D
Safety Critical Computer Systems - Open Questions and Approaches Andreas Gerstinger Institute for Computer Technology February 16, 2007
Agenda • Safety-Critical Systems • Project Partners • Three research topics • Safety Engineering • Diversity • Software Metrics • Conclusion and Outlook
Safety Critical Systems • A safety-critical computer system is a computer system whose failure may cause injury or death to human beings or the environment • Examples: • Aircraft control system (fly-by-wire,...) • Nuclear power station control system • Control systems in cars (anti-lock brakes,...) • Health systems (heart pacemakers,...) • Railway control systems • Communication systems • Wireless Sensor Networks Applications?
SYSARI Project • SYSARI = SYstem SAfety Research in Industry • Goal of the project • to conduct and promote the research in system safety engineering and safety-critical system design and development • Close cooperation between ICT and Industry • One "shared" Employee (me) • Students conducting practical Diploma Theses • PhD Theses
What is Safety? “The avoidance of death, injury or poor health to customers, employees, contractors and the general public; also avoidance of damage to property and the environment” Safety is also defined as "freedom from unacceptable risk of harm" A basic concept in System Safety Engineering is the avoidance of "hazards" Safety is NOT an absolute quantity!
Safety vs. Security • These two concepts are often mixed up • In German, there is just one term for both!
Austrian High Tech company World leader in air traffic control communication systems 700 employees, company based in Vienna, customers all over the world http://www.frequentis.com Project Partner:
Enables communication between aircraft and controller Communication link must never fail! Requirements: Safety High Availability and Reliability Fault Tolerance Other domains: railway ambulance, police, fire brigade,... maritime Safety Integrity Level 2 Frequentis Voice Communication System
French company 68000 employees worldwide Mission critical information systems 25000 researchers Nobel Prize in Physics 2007 awarded to Albert Fert, scientific director of Thales research lab http://www.thalesgroup.com Project Partner:
Signalling and Switching Axle Counters Applications for ETCS An incorrect output may lead to an incorrect signal causing a major accident! Safety Integrity Level 4 (highest) Railway Signalling Systems
(Old) Interlocking Systems Mechanical / Electromechanical Systems
Signal Box / Interlocking Tower • Electric system with some electronics
Modern Signal Box / Interlocking Tower • Lots of electronics and computer systems
What is a Hazard? • Hazard • physical condition of platform that threatens the safety of personnel or the platform, i.e. can lead to an accident • a condition of the platform that, unless mitigated, can develop into an accident through a sequence of normal events and actions • "an accident waiting to happen" • Examples • oil spilled on staircase • failed train detection system at an automatic railway level crossing • loss of thrust control on a jet engine • loss of communication • distorted communication • undetectably incorrect output
Risk Acceptability • Having identified the level of risk for the product we must determine how acceptable & tolerable that risk is • Regulator / Customer • Society • Operators • Decision criteria for risk acceptance / rejection • Absolute vs. relative risk (compare with previous, background) • Risk-cost trade-offs • Risk-benefit of technological options
Risk Tolerability Hazard Severity Probability Risk Risk Criteria Risk Reduction Measures Tolerable? No Yes
Diversity • Goal: Fault Tolerance/Detection • Diversity is "a means of achieving all or part of the specified requirements in more than one independent and dissimilar manner." • Can tolerate/detect a wide range of faults "The most certain and effectual check upon errors which arise in the process of computation, is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computations by different methods." Dionysius Lardner, 1834
Examples for Diversity • Specification Diversity • Design Diversity • Data Diversity • Time Diversity • Hardware Diversity • Compiler Diversity • Automated Systematic Diversity • Testing Diversity • Diverse Safety Arguments • … Some faults to be targeted: programming bugs, specification faults, compiler faults, CPU faults, random hardware faults (e.g. bit flips), security attacks,...
Use of two diverse compilers to compile one common source code Compiler Diversity
Compiler Diversity: Issues • Targeted Faults: • Systematic compiler faults • Some Heisenbugs • Some systematic and permanent hardware faults (if executed on one board) • Issues: • To some degree possible with one compiler and different compile options (optimization on/off,…) • If compilers from different manufacturers are taken, independence must be ensured
Systematic Automatic Diversity • Artificial introduction of diversity to tolerate HW Faults • (Automatic) Transformation of program P to a semantically equivalent program P' which uses the HW differently • e.g. different memory areas, different registers, different comparisons,... if A=B then if A-B = 0 then A or B not (not A and not B)
Systematic Automatic Diversity • What can be "diversified": • memory usage • execution sequence • statement structures • array references • data coding • register usage • addressing modes • pointers • mathematical and logic rules
Systematic Automatic Diversity: Issues • Targeted Faults: • Systematic hardware faults • Permanent random hardware faults • Issues: • Can be performed on source code or assembler level • If performed on source code level, it must be ensured that compiler does not "cancel out" diversity • (Software) Fault injection experiments showed an improvement of a factor ~100 regarding HW faults
Position P can be calculated based on speedometer and accelerometer readings Voter can also be implemented diversely PositionA and PositionB could be transmitted in different formats Example: Diverse Calculation of Position
Open Issues • How can diversity be used most efficiently? • Can diversity be introduced automatically? • Which faults are detected/tolerated to which extent? • How can the quality fo the diversity be measured? • Can diversity be also used to detect security intrusions?
Problems Which metrics should safety-critical software fulfill? Which coding rules are good and useful? What are the desired ranges for metrics? Which metrics influence maintainability? Software Metrics for Safety-Critical Systems
Outline of Method • Create a questionnaire with relevant questions regarding software quality and get answers from expert developers for various software packages they work with • Automatically measure potentially interesting metrics of the software packages • Correlate questionnaire responses with the measured metrics to find out which metric correlates with which property
Summary of Results • Strongest correlation with perceived internal quality: • Comment density • Control Flow Anomalies • No correlation with perceived internal quality: • Cyclomatic Complexity • Average Method Size • Average File Size • ...
Further Related Topics • Agile Methods in Safety Critical Development • Hazard Analysis Methods • Safety Standards • Safety of Operating Systems • COTS Components for Safety-Critical Systems • Safety Aspects of Modern Programming Languages (Java, C#.NET) • Fault Detection, Correction and Tolerance • Safety and Security Harmonisation • Linux in Safety-Critical Environments • Online Tests to detect hardware faults
Conclusion • Many open issues in this field... • All research activities in SYSARI project practically motivated • Number of safety-critical systems increases • International Standards play a vital role (e.g. IEC 61508) Contact: Andreas Gerstinger: gerstinger@ict.tuwien.ac.at