1 / 37

ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Understand and address soft errors in VLSI design to create robust hardware; Learn causes, impacts & mitigation techniques.<br>

newd
Download Presentation

ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ELEC 7770Advanced VLSI DesignSpring 2007Soft Errors and Fault-Tolerant Design Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07 ELEC 7770: Advanced VLSI Design (Agrawal)

  2. Soft Errors • Soft errors are the errors caused by the operating environment. • They are not due to a permanent hardware fault. • Soft errors are intermittent or random, which makes their testing unreliable. • One way to deal with soft errors is to make hardware robust: • Capable of detecting soft errors • Capable of correcting soft errors • Both measures are probabilistic ELEC 7770: Advanced VLSI Design (Agrawal)

  3. Some Early References • J. von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, John von Neumann: Collected Works, Volume V: Design of Computers, Theory of Automata and Numerical Analysis, Oxford University Press, 1963. • M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” IEEE Trans. Computers, vol. C-22, no. 3, pp. 241-246, March 1973. • T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” IEEE Trans. Electron Devices, vol. ED-26, no. 1, pp. 2-9, 1979. ELEC 7770: Advanced VLSI Design (Agrawal)

  4. Causes of Soft Errors • Interconnect coupling (crosstalk). • Power supply noise: IR-drop, delta-I. • Effects generally attributed to alpha-particles: • Charged particles: electrons, protons, ions. • Radiation (photons): X-rays, gamma-rays, ultra-violet light. ELEC 7770: Advanced VLSI Design (Agrawal)

  5. Sources of Alpha-Particles • Radioactive contamination in VLSI packaging material. • Ionosphere, magnetosphere and solar radiation. • Other electromagnetic radiation. ELEC 7770: Advanced VLSI Design (Agrawal)

  6. Alpha-Particle • Helium nucleus: two protons and two neutrons, mass = 6.65 ×10-27kg, charge = +2e (e = 1.6 ×10-19C). • Energy = 3.73 GeV ELEC 7770: Advanced VLSI Design (Agrawal)

  7. Soft Error Rate (SER) • Failures in time (FIT): One FIT is 1 error per billion hours of operation. • Alternative unit is mean time between failures (MTBF). 1 year MTBF = 109/(365×24) = 114,155 FIT ELEC 7770: Advanced VLSI Design (Agrawal)

  8. Particle Strike Ion or Charged particle - + + + + - - n p - substrate ELEC 7770: Advanced VLSI Design (Agrawal)

  9. Induced Current current time I(t) = I0(e– t/a – e– t/b), a >> b ELEC 7770: Advanced VLSI Design (Agrawal)

  10. Voltage Induced at a Node V = Q/C Where Q = ∫ I(t) dt C = node capacitance Smaller node capacitance will result in larger voltage swing. ELEC 7770: Advanced VLSI Design (Agrawal)

  11. Effect on Digital Circuit Charged Particles Charged Particles Combinational Logic IN OUT CK ELEC 7770: Advanced VLSI Design (Agrawal)

  12. An SRAM Cell WL VDD 1 0 bit bit BL BL ELEC 7770: Advanced VLSI Design (Agrawal)

  13. SRAM Cell Struck by Alpha-ParticleSingle-Event Upset (SEU) Charged Particles WL VDD 1→0 0→1 bit bit BL BL ELEC 7770: Advanced VLSI Design (Agrawal)

  14. D-Latch 1 D Q 0 CK = 0 ELEC 7770: Advanced VLSI Design (Agrawal)

  15. SEU in D-Latch Charged Particles 1→0 D Q 0→1 CK = 0 ELEC 7770: Advanced VLSI Design (Agrawal)

  16. Single Event Transients in Combinational Logic 1 1 0 1 CK 1 Charged Particles 0 CK ELEC 7770: Advanced VLSI Design (Agrawal)

  17. Effects of Transients • Error correcting effects • Transient pulse is filtered by gate inertia • Transient is blocked by an unsensitized path • Transient is blocked by an inactive clock • Error enhancing effects • Large number of gates can produce multiple pulses • Fanouts can multiply error pulses ELEC 7770: Advanced VLSI Design (Agrawal)

  18. SEUs in FPGA • Parts that can be affected • Look-up table (LUT) • Configuration memory cell • Flip-flop • Block RAM ELEC 7770: Advanced VLSI Design (Agrawal)

  19. F1 F2 F3 F4 LUT 1 1 1 0 0 1 0 0 out Memory cells 0 0 1 1 1 0 0 1 ELEC 7770: Advanced VLSI Design (Agrawal)

  20. F1 F2 F3 F4 SEU in LUT 1 1 1 0 0 1 0 0 out Memory cells 0 0 Charged Particle 1 1 1 changed to 0 0 0 0 1 ELEC 7770: Advanced VLSI Design (Agrawal)

  21. Four Types of SEU in FPGA M FF M M M M F1 F2 F3 F4 M Type 3 Type 2 LUT Type 1 M Type 4 Block RAM Configuration memory cell ELEC 7770: Advanced VLSI Design (Agrawal)

  22. SEU Detection Methods • Hardware redundancy • Time redundancy • Error detection codes (EDC) • Self-checker techniques ELEC 7770: Advanced VLSI Design (Agrawal)

  23. SEU Mitigation Techniques • Triple modular redundancy (TMR) • Multiple redundancy with voting • Error detection and correction codes (EDAC) • Hardened memory cells • FPGA-specific methods • Reconfiguration • Partial configuration • Rerouting design ELEC 7770: Advanced VLSI Design (Agrawal)

  24. Hardware Redundancy for Detection Combinational Logic inputs output Logic 1 indicates error Combinational Logic (duplicated) Hardware overhead is high ~ 100% Performance penalty is negligible. ELEC 7770: Advanced VLSI Design (Agrawal)

  25. D Q D Q Time Redundancy for Detection Combinational Logic inputs output CK+ d Logic 1 indicates error CK Hardware overhead is low. Performance penalty ( ~ d) = maximum detectable pulse width. ELEC 7770: Advanced VLSI Design (Agrawal)

  26. D Q D Q Repeat on Error Detection Combinational Logic inputs C output CK+ d Logic 1 indicates error CK Operation: If error is detected, then output retains its previous value. Repeating the computation can produce correct result. ELEC 7770: Advanced VLSI Design (Agrawal)

  27. Muller C-Element A C output B A S Q R output B ELEC 7770: Advanced VLSI Design (Agrawal)

  28. Triple Modular Redundancy (TMR) Combinational Logic copy 1 Combinational Logic copy 2 Majority Voter inputs output Combinational Logic copy 3 ELEC 7770: Advanced VLSI Design (Agrawal)

  29. Majority Voter Circuit A Majority Voter output B C A output B C ELEC 7770: Advanced VLSI Design (Agrawal)

  30. Alternative Implementations of Voter VDD A 0 0 0 1 0 1 1 1 LUT output B output C A B C ELEC 7770: Advanced VLSI Design (Agrawal)

  31. D Q D Q D Q D Q Triple Modular Redundancy (TMR) Combinational Logic inputs CK Majority Voter output CK+ d CK+3d CK+2d ELEC 7770: Advanced VLSI Design (Agrawal)

  32. D Q D Q D Q TMR for Memory Cells Combinational Logic inputs CK Majority Voter output CK • Problems: • Accumulation of • errors in flip-flops. • Voter is not protected. CK ELEC 7770: Advanced VLSI Design (Agrawal)

  33. r1 r2 r3 Majority Voter Majority Voter Majority Voter Majority Voter D Q D Q D Q FF Refresh and TMR for Memory Cells CK output CK CK ELEC 7770: Advanced VLSI Design (Agrawal)

  34. A Resistor Hardened SRAM Cell WL VDD 1 0 bit bit BL BL ELEC 7770: Advanced VLSI Design (Agrawal)

  35. References • F. L. Kastensmidt, L. Carro and R. Reis, Fault-Tolerant Techniques for SRAM-Based FPGAs, Springer, 2006. • S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005. ELEC 7770: Advanced VLSI Design (Agrawal)

  36. Summary of Topics Covered (1) • Nanotechnology devices • Moore’s law • System level design for testability and test scheduling problem • Verification • Logic equivalence • Binary decision diagrams • Power consumption and low-power concepts • Multi-core parallelism • Microprocessors • Memories ELEC 7770: Advanced VLSI Design (Agrawal)

  37. Summary of Topics Covered (2) • Timing • Timing verification • Timing simulation • Static timing analysis • Timing optimization • Linear programming and clock constraints • Clock skew problem • Zero skew design • Retiming, constraint graph and performance optimization • Soft errors and fault-tolerant design ELEC 7770: Advanced VLSI Design (Agrawal)

More Related