170 likes | 318 Views
Fehlererkennung in SW. David Rigler. Overview. Types of errors detection Fault/Error classification Description of certain SW error detection techniques Evaluation (Coverage / Overhead) Conclusion. Failure Runtime Detection (in Software). Software Diversity / N-Version P.
E N D
Fehlererkennung in SW David Rigler
Overview • Types of errors detection • Fault/Error classification • Description of certain SW error detection techniques • Evaluation (Coverage / Overhead) • Conclusion
Failure Runtime Detection (in Software) • Software Diversity / N-Version P. • Defensive Programming • Assertions • Bound/Range checking • Control Flow checking • Block Entry Exit Checking • Error Capturing Instructions • Advanced Techniques … • Redundant Data/Code SW - Failures HW - Failures
Transient Hardware Error Classification • Data Errors • Code Errors • Type S1 Statements • affecting data only • Type S2 Statements • affecting the execution flow • Type E1 Errors • changing operation (not control flow) • Type E2 Errors • changing the Statement type (S1 S2)
Data Errors (Executable Assertions) • Generic • Bound • Integrity • For SW and HW Errors • Non-Generic • Value Range • Approximate (False alarm)
Data Errors (systematic Data Redundancy) • Rules • Duplicate every variable: x -> (x1 and x2) • Perform write operations on x1 and x2 • Read operation on x -> check for consistency of x1 and x2
Data Errors (systematic Data Redundancy) • Generic Approach • Use pre-processor on high level language • Compiler optimisations may be a problem • All (visible) single Bit Flip Errors in DATA Memory can be detected
ControlFlowErrors Block EntryExitChecking • Uniquesignatures for Basic Blocks • Assign at Entry • Compare at Exit • Problems • Jumps within Block • Granularity • Jumps to unused Area
ControlFlowErrors • Duplicate Condition Checks
ControlFlowErrors • Error Capturing Instructions • Special or unused Instructions • Trap, SWI, … • Spread over unused Memory • Program Memory • Data Memory • Call Error Handling Function
ControlFlowErrors • Watchdog Timer • Periodically reset timer • Take Action at specific timer value • Needs Support of Hardware • Common in embedded Controllers • Detects infinite loop errors
Coverage Example 1 • BEEC, Duplicate Condition Checks, Systematic Data Redundancy • Simulated bit-flip errors in memory • ~ 5x Performance slow down • ~ 2x Size • No Silent Violations (Data) • High Coverage even for Errors in Code Area.
Coverage Example 2 • Physical Fault Injection • Heavy-Ion Radiation • Power-Supply Disturbances • Hardware WDT • Effect of additional SW • 60% 85%
Improving Coverage • Separate BB for redundant variables • Separated in Memory • No single bit-flip jumps • Use cumulative Signatures • Detect jumps within Block • Avoid Signature aliasing • Hamming distance
100% Coverage • For simple failure model • Single bit-flip • Data- and Code-Memory/Registers • Hidden Registers not included (Branch Buffer, Cache tags, etc) • High Overhead • ~4x Memory usage • >3x Time
Conclusion: Error Detection in SW • Pure SW: high coverage only for simple failure models • Addition to HW Error Detection • Trade-off: Overhead Coverage • Fine tuning possible • Use available Resources (Time, Memory)
Miremadi G., J. Karlsson, U. Gunneflo, and J. Torin, Two Software Techniques for On-Line Error Detection , Proc. of the 22th International Symposium on Fault-Tolerant Computing (FTCS-22), July 1992, pp. 328-335. Miremadi G. and J. Torin, Evaluation Processor-Behavior Three Error-Detection Mechanisms Using Physical Fault-Injection, IEEE Trans. On Reliability, Vol. 44, No. 3, Sept. 1995, pp. 441-453. Rabejac C., J.-P. Blanquart, J.-P. Queille, Lab. for Dependability Eng., CNRS, Toulouse, France, Executable assertions and timed traces for on-line software error detection, Proc. of the 26th International Symposium on Fault-Tolerant Computing (FTCS-26), 1996. Alkhalifa Z., V. S. S. Nair, N. Krishnamurthy and J. A. Abraham, Design and Evaluation of Systemlevel Checks for On-line Control Flow Error Detection, IEEE Trans. on Parallel and Distributed Systems, Vol. 10, No. 6, Jun. 1999, pp. 627-641. M. Fazeli, R. Farivar, S. G. Miremadi, "A Software-Based Concurrent Error Detection Technique for PowerPC Processor-based Embedded systems", Proc. Of 20th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), Monterey, California, 2005. Software Detection Mechanisms Providing Full Coverage Against Single Bit-Flip Faults B. Nicolescu, Y. Savaria, Senior Member, IEEE, and R. Velazco, Member, IEEE Soft-error Detection through Software Fault-Tolerance techniques Maurizio REBAUDENGO, Matteo SONZA REORDA, Marco TORCHIANO, Massimo VIOLANTE