220 likes | 238 Views
Explore error detection techniques like watchdog processor, control flow error detection, fault injection, and types of signatures in fault-tolerant systems deployed in space environments. Learn the importance of fault avoidance and fault tolerance in space missions.
E N D
Fault Tolerant Systems in a Space Environment EE585: Fault Tolerance Computing
Overview • Introduction • Error Detection Technique. *Watchdog Processor *Control Flow Error Detection. *Types of Signatures. • Fault Injection. • Conclusion. EE585: Fault Tolerance Computing
Introduction • Experimented by CRC on Advanced Research and Global Observations Satellite.(ARGOS) • The approach mainly focuses on Space missions involving equipment that combines the two basic approaches of Fault Avoidance and Fault Tolerance • Mainly uses Software Techniques for detecting errors. EE585: Fault Tolerance Computing
Error Detection Techniques • Watch dog Processor It is a small processor that sits on buses , passively observes the bus transactions generated by main processor and detects errors by monitoring. EE585: Fault Tolerance Computing
Watchdog Processor EE585: Fault Tolerance Computing
Control Flow Error Detection • Main goal is to check the correct sequencing of the instructions. • Done by Signature Analysis. It is a method in which signature is associated with a block of instructions and saved at compile time. During runtime, generated signature is compared with saved ones and errors are detected. EE585: Fault Tolerance Computing
Types of Signatures • 1. Path Signature Analysis: * Signatures are computed for sequence of nodes, i.e., paths rather than single node. * Two bits are used to differentiate signatures * A special tag signals the time to compare the computed signature with embedded one. • 2. Signature Instruction Streams (SIS) EE585: Fault Tolerance Computing
Contd…. EE585: Fault Tolerance Computing
Contd… • Paths are grouped into sets and each set has a signature, called justifying signature. • Control flow diagram of three basic blocks EE585: Fault Tolerance Computing
2.Signature Instruction Streams (SIS) EE585: Fault Tolerance Computing
Contd… • To reduce number of signatures embedded in the code, Branch Address hashing is used. EE585: Fault Tolerance Computing
Branch Address Hashing EE585: Fault Tolerance Computing
Stutter Step Mode (SSM) • Each group of instructions is executed twice or more and the results are compared. It detects errors missed by other techniques. • Disadvantages: * Performance level is lowered. * Memory overhead. EE585: Fault Tolerance Computing
Application of SSM to one instruction • Overhead is 300% EE585: Fault Tolerance Computing
Contd… • Reduced overhead by extending duplication to a basic block. EE585: Fault Tolerance Computing
Error Masking in SSM EE585: Fault Tolerance Computing
Contd… • Assume, values of registers B= 10 C= 7 => A= 17 D= 3 (We know the result of dividing any number between 19 and 15 by 5 is 3.) • Say if A= 18 (instead of 17), the error is not detected. • Therefore, we need to be careful in selecting the error detection technique. EE585: Fault Tolerance Computing
Fault Injection • One way to validate Fault tolerance mechanisms • Advantages: 1. Flexibility 2. Controllability 3. Predictability • Disadvantages: 1. Its questionable whether the injected faults are good representation of faults in real environment. EE585: Fault Tolerance Computing
Contd… • In ARGOS, system is tested in Space environment created. • Different approaches to fault injection in electronic systems: 1. Disturb the signals on the pins of the pins. 2. Radiation. 3. Power Supply Disturbance. 4. Logic simulation. EE585: Fault Tolerance Computing
Conclusion • Determined the tradeoffs between fault tolerance and fault avoidance techniques and finally come up with an efficient blend of technique suitable. • Hardware and Software fault tolerance techniques are studied. EE585: Fault Tolerance Computing
References • Fault Tolerant Systems in a Space Environment. - Philip P.Shirvani and Edward J. McCluskey. (Stanford University) • http://www-crc.stanford.edu/crc_papers/CRC-TR-98-2.pdf EE585: Fault Tolerance Computing
Queries? EE585: Fault Tolerance Computing