160 likes | 270 Views
Verification of FT System Using Simulation. Petr Grillinger. Definitions. Fault tolerant systems are used in safety critical applications. Fault tolerant (FT) system – a system that provides required functionality even in the presence of faults.
E N D
Verification of FT System Using Simulation Petr Grillinger
Definitions • Fault tolerant systems are used in safety critical applications. • Fault tolerant (FT) system – a system that provides required functionality even in the presence of faults. • Safety critical application – the cost of a failure is much higher than the price of the system, e.g. human lives are in danger, a production plant is stopped. • Real-time (RT) system – the system responds to events immediately as they occur. Hard RT systems provide guaranteed deadlines.
Fault Tolerance • A fault is a random or malicious defect introduced to the system. A fault may cause an error state of the system. • A system enters error state if its normal operation can not be performed anymore (due to a fault). A recognized error does not mean a failure of the system. • The system fails if it no longer meets the requirements for proper functions.
Possible Effects of a Fault • Fault effectiveness is the probability that a fault actually does anything (a measure system efficiency). • Error detection mechanism (EDM) is essential for FT systems (it is not possible to recover from undetected errors). • Error Recovery may prevent system failure if the recovery time is small (or limit the consequences).
FT Verification Methods • Formal methods use exact mathematical proofs to verify specified FT claims (exact results but difficult for complex systems). • Fault injection (FI) experiments evaluate FT properties using artificially induced faults (usually probabilistic results). Variants are: • Hardware – EMI, heavy-ion, pin-level. • Software – malicious software, memory faults. • Simulation – depends on model abstraction level. • Hybrid – combines any of the techniques above.
Comparison of FI Methods • Hardware FI: • Pros: Reliable results. • Cons: additional HW, intrusive, non-deterministic. • Software FI: As HW FI (different type of faults) • Pros: no additional HW. • Simulation FI: • Pros: deterministic, non-intrusive, no extra HW. • Cons: Validity of results depends on model quality, additional model development time.
Building of Simulation Model • Select level of abstraction to separate elementary function from unnecessary detail. • Design modular structure, separate functionality from simulation overhead. • Select simulation tool and programming language. • Implement and debug. • Verify that the model is valid and future experiments will have feedback to reality. • Design one or more testing applications that will use the model.
Abstraction Level • Low or no abstraction (e.g. VHDL model): • The same behavior as the modeled system (when generated from a formal specification). • Low performance, high complexity (difficult to pinpoint the exact location of a bug). • High level of abstraction: • Better performance, lower complexity (irrelevant details are hidden). • Limited validity (bugs introduced through the abstraction), limited usability.
Modular Structure of TTP/C Functionality Simulation Visualization Standard Libraries ANSI C C-Sim C++ Builder Experiment Environment fe… se… ve… TTP/C Protocol fp… sp… vp… Application Specific fa… sa… va…
Implementation Language • ANSI C – maximum portability and performance. Code for microcontrollers is often written in C. • C++ – good performance, OOP principles, templates, STL support. • Java – low performance, OOP principles, parallelism, portable GUI. • High level simulation tool (e.g. Witness) – rapid development suitable for typical simulation tasks (e.g. queuing networks).
Testing Model Validity • Possible methods: • Testing against specification: monitoring of system behavior under different parameter sets. Non-automated tests are time-consuming and provide little assurance about the validity (automated test require formal specification). • Parallel execution of model and reference system: often is automated and provides excellent validity proof (assuming the reference system is valid). • Validation of the model is performed first without FI experiments, then with them. The second phase already gives useful results about FT.
Testing Application • We need an application for debugging and massive experimentation (e.g. the sine-wave application in TTP/C model): • As simple as possible to track down problems without complications. • Using as much features of the model as possible to gain maximum coverage. • We often need also a real-world application that covers a particular problem and verifies behavior of the system in certain circumstances (e.g. Brake-by-wire in the TTP/C model).
FT Verification Process • Hypothesis – fault assumption, input, expected results. • Fault model – type of fault, target, etc. • Testing application – modeled system doing something sensible. Choosing appropriate application is not simple, often there must be several testing applications. • Execution of the testing application with FI confirms or denies the hypothesis. A measure of experiment quality is FI coverage – the ratio of injected faults to all possible faults.
Advantages of Simulation • Discrete-time allows to slice the time flow deliberately – no intrusion effect. • Possibility to make changes to the model to see what happens, if… (e.g. to test a proposed modification, bug correction). • Determinism – the ability to repeat the same experiment with the same results (e.g. we can ran rapid black box experiments with minimum logging first, then enable full log and repeat only the most interesting experiments) • Access to every part of the model – used for monitoring and FI.
Results from FIT • A model was made that executes approximately in real-time. • Several minor flaws in the specification were found using the model. • One bug in the HW implementation was found during validation of the model. • A severe case of fault propagation has been found. A solution to the problem was proposed and using a modified model also verified.
Personal Insights • Logging is extremely useful. It is often desirable to log only certain events or to start logging after a trigger. • Visual interface (GUI) is also extremely useful to monitor simulation progress. It is also excellent for demonstrations. • Temporal faults in discrete-time simulation require special handling. All hold operations must check the current time after they return. • Floating point representation of time in C-Sim is not ideal (fixed point version with greater range may be better).