370 likes | 382 Views
ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design. Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07. Soft Errors.
E N D
ELEC 7770Advanced VLSI DesignSpring 2007Soft Errors and Fault-Tolerant Design Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07 ELEC 7770: Advanced VLSI Design (Agrawal)
Soft Errors • Soft errors are the errors caused by the operating environment. • They are not due to a permanent hardware fault. • Soft errors are intermittent or random, which makes their testing unreliable. • One way to deal with soft errors is to make hardware robust: • Capable of detecting soft errors • Capable of correcting soft errors • Both measures are probabilistic ELEC 7770: Advanced VLSI Design (Agrawal)
Some Early References • J. von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, John von Neumann: Collected Works, Volume V: Design of Computers, Theory of Automata and Numerical Analysis, Oxford University Press, 1963. • M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” IEEE Trans. Computers, vol. C-22, no. 3, pp. 241-246, March 1973. • T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” IEEE Trans. Electron Devices, vol. ED-26, no. 1, pp. 2-9, 1979. ELEC 7770: Advanced VLSI Design (Agrawal)
Causes of Soft Errors • Interconnect coupling (crosstalk). • Power supply noise: IR-drop, delta-I. • Effects generally attributed to alpha-particles: • Charged particles: electrons, protons, ions. • Radiation (photons): X-rays, gamma-rays, ultra-violet light. ELEC 7770: Advanced VLSI Design (Agrawal)
Sources of Alpha-Particles • Radioactive contamination in VLSI packaging material. • Ionosphere, magnetosphere and solar radiation. • Other electromagnetic radiation. ELEC 7770: Advanced VLSI Design (Agrawal)
Alpha-Particle • Helium nucleus: two protons and two neutrons, mass = 6.65 ×10-27kg, charge = +2e (e = 1.6 ×10-19C). • Energy = 3.73 GeV ELEC 7770: Advanced VLSI Design (Agrawal)
Soft Error Rate (SER) • Failures in time (FIT): One FIT is 1 error per billion hours of operation. • Alternative unit is mean time between failures (MTBF). 1 year MTBF = 109/(365×24) = 114,155 FIT ELEC 7770: Advanced VLSI Design (Agrawal)
Particle Strike Ion or Charged particle - + + + + - - n p - substrate ELEC 7770: Advanced VLSI Design (Agrawal)
Induced Current current time I(t) = I0(e– t/a – e– t/b), a >> b ELEC 7770: Advanced VLSI Design (Agrawal)
Voltage Induced at a Node V = Q/C Where Q = ∫ I(t) dt C = node capacitance Smaller node capacitance will result in larger voltage swing. ELEC 7770: Advanced VLSI Design (Agrawal)
Effect on Digital Circuit Charged Particles Charged Particles Combinational Logic IN OUT CK ELEC 7770: Advanced VLSI Design (Agrawal)
An SRAM Cell WL VDD 1 0 bit bit BL BL ELEC 7770: Advanced VLSI Design (Agrawal)
SRAM Cell Struck by Alpha-ParticleSingle-Event Upset (SEU) Charged Particles WL VDD 1→0 0→1 bit bit BL BL ELEC 7770: Advanced VLSI Design (Agrawal)
D-Latch 1 D Q 0 CK = 0 ELEC 7770: Advanced VLSI Design (Agrawal)
SEU in D-Latch Charged Particles 1→0 D Q 0→1 CK = 0 ELEC 7770: Advanced VLSI Design (Agrawal)
Single Event Transients in Combinational Logic 1 1 0 1 CK 1 Charged Particles 0 CK ELEC 7770: Advanced VLSI Design (Agrawal)
Effects of Transients • Error correcting effects • Transient pulse is filtered by gate inertia • Transient is blocked by an unsensitized path • Transient is blocked by an inactive clock • Error enhancing effects • Large number of gates can produce multiple pulses • Fanouts can multiply error pulses ELEC 7770: Advanced VLSI Design (Agrawal)
SEUs in FPGA • Parts that can be affected • Look-up table (LUT) • Configuration memory cell • Flip-flop • Block RAM ELEC 7770: Advanced VLSI Design (Agrawal)
F1 F2 F3 F4 LUT 1 1 1 0 0 1 0 0 out Memory cells 0 0 1 1 1 0 0 1 ELEC 7770: Advanced VLSI Design (Agrawal)
F1 F2 F3 F4 SEU in LUT 1 1 1 0 0 1 0 0 out Memory cells 0 0 Charged Particle 1 1 1 changed to 0 0 0 0 1 ELEC 7770: Advanced VLSI Design (Agrawal)
Four Types of SEU in FPGA M FF M M M M F1 F2 F3 F4 M Type 3 Type 2 LUT Type 1 M Type 4 Block RAM Configuration memory cell ELEC 7770: Advanced VLSI Design (Agrawal)
SEU Detection Methods • Hardware redundancy • Time redundancy • Error detection codes (EDC) • Self-checker techniques ELEC 7770: Advanced VLSI Design (Agrawal)
SEU Mitigation Techniques • Triple modular redundancy (TMR) • Multiple redundancy with voting • Error detection and correction codes (EDAC) • Hardened memory cells • FPGA-specific methods • Reconfiguration • Partial configuration • Rerouting design ELEC 7770: Advanced VLSI Design (Agrawal)
Hardware Redundancy for Detection Combinational Logic inputs output Logic 1 indicates error Combinational Logic (duplicated) Hardware overhead is high ~ 100% Performance penalty is negligible. ELEC 7770: Advanced VLSI Design (Agrawal)
D Q D Q Time Redundancy for Detection Combinational Logic inputs output CK+ d Logic 1 indicates error CK Hardware overhead is low. Performance penalty ( ~ d) = maximum detectable pulse width. ELEC 7770: Advanced VLSI Design (Agrawal)
D Q D Q Repeat on Error Detection Combinational Logic inputs C output CK+ d Logic 1 indicates error CK Operation: If error is detected, then output retains its previous value. Repeating the computation can produce correct result. ELEC 7770: Advanced VLSI Design (Agrawal)
Muller C-Element A C output B A S Q R output B ELEC 7770: Advanced VLSI Design (Agrawal)
Triple Modular Redundancy (TMR) Combinational Logic copy 1 Combinational Logic copy 2 Majority Voter inputs output Combinational Logic copy 3 ELEC 7770: Advanced VLSI Design (Agrawal)
Majority Voter Circuit A Majority Voter output B C A output B C ELEC 7770: Advanced VLSI Design (Agrawal)
Alternative Implementations of Voter VDD A 0 0 0 1 0 1 1 1 LUT output B output C A B C ELEC 7770: Advanced VLSI Design (Agrawal)
D Q D Q D Q D Q Triple Modular Redundancy (TMR) Combinational Logic inputs CK Majority Voter output CK+ d CK+3d CK+2d ELEC 7770: Advanced VLSI Design (Agrawal)
D Q D Q D Q TMR for Memory Cells Combinational Logic inputs CK Majority Voter output CK • Problems: • Accumulation of • errors in flip-flops. • Voter is not protected. CK ELEC 7770: Advanced VLSI Design (Agrawal)
r1 r2 r3 Majority Voter Majority Voter Majority Voter Majority Voter D Q D Q D Q FF Refresh and TMR for Memory Cells CK output CK CK ELEC 7770: Advanced VLSI Design (Agrawal)
A Resistor Hardened SRAM Cell WL VDD 1 0 bit bit BL BL ELEC 7770: Advanced VLSI Design (Agrawal)
References • F. L. Kastensmidt, L. Carro and R. Reis, Fault-Tolerant Techniques for SRAM-Based FPGAs, Springer, 2006. • S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005. ELEC 7770: Advanced VLSI Design (Agrawal)
Summary of Topics Covered (1) • Nanotechnology devices • Moore’s law • System level design for testability and test scheduling problem • Verification • Logic equivalence • Binary decision diagrams • Power consumption and low-power concepts • Multi-core parallelism • Microprocessors • Memories ELEC 7770: Advanced VLSI Design (Agrawal)
Summary of Topics Covered (2) • Timing • Timing verification • Timing simulation • Static timing analysis • Timing optimization • Linear programming and clock constraints • Clock skew problem • Zero skew design • Retiming, constraint graph and performance optimization • Soft errors and fault-tolerant design ELEC 7770: Advanced VLSI Design (Agrawal)