1 / 55

Single Event Error Analysis: Critical For Today’s Advanced Technologies or A Problem Solved?

Single Event Error Analysis: Critical For Today’s Advanced Technologies or A Problem Solved?. Lloyd W. Massengill Dept. of Electrical Engineering and Computer Science Vanderbilt University Nashville, TN, USA. Acknowledgements. Vanderbilt EECS Students and PostDocs, especially: Jie Meng

zack
Download Presentation

Single Event Error Analysis: Critical For Today’s Advanced Technologies or A Problem Solved?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single Event Error Analysis: Critical For Today’s Advanced Technologies or A Problem Solved? Lloyd W. Massengill Dept. of Electrical Engineering and Computer Science Vanderbilt University Nashville, TN, USA

  2. Acknowledgements • Vanderbilt EECS Students and PostDocs, especially: • Jie Meng • Vivian Zhu • Kevin Warren • Jayanth Shreedhara • Anne Baranski • Diek Van Nort • Claude Cirba • Radiation Effects Group at Vanderbilt, including: • Ron Schrimpf • Dan Fleetwood • Bharat Bhuva • Ken Galloway MAPLD Conference 2000

  3. Headlines • INTEL: “Soft errors are the second biggest [reliability] concern after leakage current in submicron design” * • Tim Dell, IBM: “for every 256 Mbytes of memory, you will get one soft error a month due to cosmic-ray-generated neutrons” ** * EE Times, 6/16/99 ** EDTN, 6/10/99 MAPLD Conference 2000

  4. Headlines An Error Has Occurred in Windows Continue or Close? “For every 256 Mbytes of memory, you will get one soft error a month due to cosmic-ray-generated neutrons” MAPLD Conference 2000

  5. Headlines An Error Has Occurred in Windows Continue or Close? P.S. Doesn’t matter what you choose… All of your work is now lost... “For every 256 Mbytes of memory, you will get one soft error a month due to cosmic-ray-generated neutrons” Will we always be able to blame this on Bill Gates? MAPLD Conference 2000

  6. Bottom Line • Single event effects (SEEs) are taking a prominent position in the mainstream integrated circuit industry • Many commercial manufacturers are coming to grips with the problem as a key reliability issue • GHz logic, terabyte RAM, low-power circuits are leading to new upset scenarios • Clearly, there is a recognized need for SEE analysis integrated into accepted design flows MAPLD Conference 2000

  7. Single Event Modeling in Commercial Technologies Outline • Examples of Commercial Single Event (SE) Concerns • SE Reliability Analysis of the Commercial Technology Roadmap • SE Issues in These New Technologies and Modeling Concerns • A Few Commercial Modeling Frontiers: • Terrestrial Neutrons • The Incredible Shrinking DRAM • Combinational Logic • Conclusions MAPLD Conference 2000

  8. Example Commercial SEE Concerns: Intel: • Testing for alpha-induced soft errors in next-generation (0.18um) logic designs • Anticipating the development of a strategy for hardening to terrestrial neutron SEEs • Impurity alpha-particle SEEs in SRAMs 1999, 1998 Symposium on VLSI Technology 1998 SRC/Sematech/NASA Conf. on Reliabiliy MAPLD Conference 2000

  9. Example Commercial SEE Concerns: Intel: • Testing for alpha-induced soft errors in next-generation (0.18um) logic designs • Anticipating the development of a strategy for hardening to terrestrial neutron SEEs • Impurity alpha-particle SEEs in SRAMs • Large banks of DRAMs • Neutron-induced SEE • Multiple-bit RAM errors (thwarting EDAC) • Marketing memory modules with multiple-bit EDAC ASICs IBM: 1999, 1998 Symposium on VLSI Technology 1998 SRC/Sematech/NASA Conf. on Reliabiliy MAPLD Conference 2000

  10. Example Commercial SEE Concerns: Intel: • Testing for alpha-induced soft errors in next-generation (0.18um) logic designs • Anticipating the development of a strategy for hardening to terrestrial neutron SEEs • Impurity alpha-particle SEEs in SRAMs • Large banks of DRAMs • Neutron-induced SEE • Multiple-bit RAM errors (thwarting EDAC) • Marketing memory modules with multiple-bit EDAC ASICs • Neutron-induced SEE • Multiple bit RAM errors IBM: Fujitsu: 1999, 1998 Symposium on VLSI Technology 1998 SRC/Sematech/NASA Conf. on Reliabiliy MAPLD Conference 2000

  11. Example Commercial SEE Concerns: Neutrons, DRAMs, Logic Intel: • Testing for alpha-induced soft errors in next-generation (0.18um) logic designs • Anticipating the development of a strategy for hardening to terrestrial neutron SEEs • Impurity alpha-particle SEEs in SRAMs • Large banks of DRAMs • Neutron-induced SEE • Multiple-bit RAM errors (thwarting EDAC) • Marketing memory modules with multiple-bit EDAC ASICs • Neutron-induced SEE • Multiple bit RAM errors • Alpha particles from high-K capacitor materials in advanced DRAMs IBM: Fujitsu: Texas Instruments: 1999, 1998 Symposium on VLSI Technology 1998 SRC/Sematech/NASA Conf. on Reliabiliy MAPLD Conference 2000

  12. Commercial Reliability Constraints Prioritized List of SIA Reliability Technology Constraints (Year 2000) (Ranked in order, from highest to lowest priority) Gate Dielectric Reliability Electromigration ESD Multi-level Metal/Dielectric Integrity Hot Carriers Defectivity, Cleanliness Wafer Charging Noise Margin / Coupling Latch-up Tools for Reliability Checking Package Induced Failure Soft Error (Single Event Upset) Cost Effective Qualification Transistor matching for mixed signal Performance/Power/Reliability Tradeoffs Source: “Quality and Reliability Issues in the National Technology Roadmap for Semiconductors”; SEMATECH; 1998 MAPLD Conference 2000

  13. Commercial Reliability Constraints Prioritized List of SIA Reliability Technology Constraints (Year 2000) (Ranked in order, from highest to lowest priority) Gate Dielectric Reliability Electromigration ESD Multi-level Metal/Dielectric Integrity Hot Carriers Defectivity, Cleanliness Wafer Charging Noise Margin / Coupling Latch-up Tools for Reliability Checking Package Induced Failure Soft Error (Single Event Upset) Cost Effective Qualification Transistor matching for mixed signal Performance/Power/Reliability Tradeoffs Source: “Quality and Reliability Issues in the National Technology Roadmap for Semiconductors”; SEMATECH; 1998 MAPLD Conference 2000

  14. Technology TrendsProcess Source: 1997 & 1999 SIA National Technology Roadmaps for Semiconductors MAPLD Conference 2000

  15. Technology TrendsProcess V produced by a 14MeVneutron product (156fC)on a 50fF capacitance V produced by a thermalneutron product (100fC)on a 50fF capacitance MAPLD Conference 2000

  16. Technology TrendsDRAMs Source: 1997 & 1999 SIA National Technology Roadmaps for Semiconductors MAPLD Conference 2000

  17. Technology TrendsDRAMs Divergence: cell size scaling faster than the square of the feature size MAPLD Conference 2000

  18. Technology TrendsLogic Source: 1997 & 1999 SIA National Technology Roadmaps for Semiconductors MAPLD Conference 2000

  19. Technology TrendsLogic Source: 1997 & 1999 SIA National Technology Roadmaps for Semiconductors MAPLD Conference 2000

  20. SEE Issues Technological • New processing materials leading to additional alpha sources • High-k capacitor dielectrics … e.g. barium strontium titanate (BST), Pt electrodes • Low-k interconnect dielectrics • Packaging … e.g. relaxed design rules for PbSn bumps for flip chip • Decreased VDD (reduced noise margins) • Smaller devices (comparable track dimensions, micro dose) • Higher effective gate dielectric fields (SEDR, hard errors) Circuit • Low power (lower noise margin, reduced switching energy) • Reduced capacitances (lower critical charge) • Speed (response to reduced pulse width glitches) System • Packing density, 3-d integration of DRAMs (multiple bit upsets) • Combinational logic • Analog/mixed signal • System on a chip • Effectiveness of EDAC at terabyte levels MAPLD Conference 2000

  21. SEE Design/Analysis Concerns • “Serious gaps exist in technology computer-aided design (TCAD) and CAD tools for product designs that do not adequately simulate the effects of failure mechanisms” * • “Ideally, reliability scaling models, similar to technology scaling models, should be developed and verified” * * “Quality and Reliability Issues in the National Technology Roadmap for Semiconductors”; SEMATECH; 1998 MAPLD Conference 2000

  22. Three Important Commercial SEE Concerns I. Terrestrial Neutron SE Effects II. DRAMs and Scaling to Terabyte Multi-chip DRAM Modules III. Microprocessor Logic MAPLD Conference 2000

  23. Terrestrial NeutronsThe Phantom Menace Example thermal neutron reaction: Energy; Range; Time of Flight; Total Charge 0.8 MeV; 2.4mm; 1.5ps; 36fC 1.5 MeV; 5.2mm; 2.4ps; 67fC Created by cosmic ion interactions with oxygen and nitrogen in the upper atmosphere MAPLD Conference 2000

  24. Terrestrial Neutrons The Problem Normand (Boeing) field data*: • 1-2E-12 upset/bit-hr representative of many SRAM/DRAM tech of early 90’s (4M DRAMs, 256k SRAMs) --> 1-2E-4 errors/day for 4MB DRAMs --> about 1 error per 1-2 years for 64MB (8 Mbytes) If number holds with scaling to 4G DRAMs (512 Mbytes) --> 0.2 upsets/day or one error every 5-10 days * Normand; TNS 1996 MAPLD Conference 2000

  25. Terrestrial Neutrons The Problem Normand (Boeing) field data*: • 1-2E-12 upset/bit-hr representative of many SRAM/DRAM tech of early 90’s (4M DRAMs, 256k SRAMs) --> 1-2E-4 errors/day for 4MB DRAMs --> about 1 error per 1-2 years for 64MB (8 Mbytes) If number holds with scaling to 4G DRAMs (512 Mbytes) --> 0.2 upsets/day or one error every 5-10 days Neutrons cannot be controlled in the way decay-alphas were: • Moving the source (e.g. PbSn bumps) away from sensitive nodes effective for alphas -- impossible for neutrons • Shielding the source with surface coatings -- ineffective for neutrons • Eliminating the source with cleaner materials and packaging effective for alphas -- impossible for neutrons * Normand; TNS 1996 MAPLD Conference 2000

  26. Terrestrial NeutronsSEE Concerns Issues • Terrestrial neutron SEEs at ground level have been well documented • Difficult (impossible) to control as alphas have been • Very low Qcrit circuits and scaled device dimensions exacerbate circuit response to short-range, low-energy neutron products • The link from statistical reaction probabilities to system error rates has not been well developed MAPLD Conference 2000

  27. DRAMsThe Incredible Shrinking DRAM • Cell sizes below 0.2 mm2 • Integration density above 300x106 bits / cm2 MAPLD Conference 2000

  28. DRAMsContainment of the Cell SEE Problem Commercial control of SEE: After: Kitsukawa, JSSC, 1993 MAPLD Conference 2000

  29. DRAMsContainment of the Cell SEE Problem Commercial control of SEE: • Reduced collection depth through thinner epi and wells, and modified doping profiles After: Kitsukawa, JSSC, 1993 MAPLD Conference 2000

  30. DRAMsContainment of the Cell SEE Problem Commercial control of SEE: • Reduced collection depth through thinner epi and wells, and modified doping profiles • Movement of the stored charge away from shrinking collecting junctions After: Kitsukawa, JSSC, 1993 MAPLD Conference 2000

  31. DRAMsContainment of the Cell SEE Problem Commercial control of SEE: • Reduced collection depth through thinner epi and wells, and modified doping profiles • Movement of the stored charge away from shrinking collecting junctions • Cleaner materials (reduced natural alpha emissions) After: Kitsukawa, JSSC, 1993 MAPLD Conference 2000

  32. DRAMsContainment of the Cell SEE Problem Commercial control of SEE: • Reduced collection depth through thinner epi and wells, and modified doping profiles • Movement of the stored charge away from shrinking collecting junctions • Cleaner materials (reduced natural alpha emissions) • Increased charge per unit surface area (next slides) After: Kitsukawa, JSSC, 1993 MAPLD Conference 2000

  33. DRAMsContainment of the Cell SEE Problem Cell area scaling has outpaced the drop in stored charge -- leading to an actual increase in charge stored per unit surface area Increasing signal charge to area ratio and decreasing collection depths have effectively contained the SEU problem in memory cores Statistics of per-bit error rates have improved Sources: JSSC, ISSCC 1985-1999 SIA Nat. Tech. Roadmap MAPLD Conference 2000

  34. But… 3-d DRAM scaling has introduced other SEE concerns 1999 1999 A key scaling trend which has been crucial to the containment of DRAM SEE may be saturating Roadmap 1999 Sources: JSSC, ISSCC 1985-1999 SIA Nat. Tech. Roadmap MAPLD Conference 2000

  35. But… 3-d DRAM scaling has introduced other SEE concerns 1999 A key scaling trend which has been crucial to the containment of DRAM SEE may be saturating Roadmap Actual NEC 4G DRAM 1999 Sources: JSSC, ISSCC 1985-1999 SIA Nat. Tech. Roadmap MAPLD Conference 2000

  36. DRAMsSurpassing the 8F2 Theoretical Limit • DRAM cell sizes are scaling faster than the square of feature size • A measure of this is the parameter : • N=8 is accepted as the DRAM theoretical limit • 1GB DRAMs have reached the theoretical limit • 4GB DRAMs must surpass this limit to maintain scaling trends • Method to overcome the N=8 limit: multiple bits stored on each memory cell capacitance • Watershed SEE event?? Chart adapted from: Okuda, JSSC, 1997 MAPLD Conference 2000

  37. DRAMsSEE Concerns Issues • Cell upsets in the core array -- saturating charge / unit area past 1GB • Bit-line upsets -- analog sense amp vulnerability • Multiple bit upsets -- 2-d and 3-d integration • Increasing refresh cycles -- reduced margins for charge loss • 3-d Terabyte modules -- 3-d multiple-bit/single-word upset • Hard errors: microdose, SEDR -- increasing gate E-fields MAPLD Conference 2000

  38. Combinational LogicBrave New World • Scaling toward GHz logic has led to: • reduced nodal interconnect capacitances and lower Vdd • reduced switching energy • gate delays (<10ps) on the order of SE charge collection times • very efficient analog pulse (SE glitch) propagation • Logic SE sensitivity is a substantial and growing concern MAPLD Conference 2000

  39. Combinational LogicBrave New World • Scaling toward GHz logic has led to: • reduced nodal interconnect capacitances and lower Vdd • reduced switching energy • gate delays (<10ps) on the order of SE charge collection times • very efficient analog pulse (SE glitch) propagation • Logic SE sensitivity is a substantial and growing concern • Analysis of combinational logic SEE requires a new way of quantifying the relationship between the single event effect on a node and the overall circuit response MAPLD Conference 2000

  40. Combinational LogicSE Effects in Synchronous Circuitry -- Soft Faults Direct SE Hit S2 Soft Fault D Q G A1 D Possible Path State 000 Q H E S1 B1 D Q Soft Fault Possible Path State 100 Q F I C1 Combinational SE Hit CLK Single event charge deposition creates a transient noise pulse SE pulse competes with legitimate synchronous digital pulses Effect of the pulse depends on: • vulnerability of a node • active combinational logic paths • pulse shaping and propagation delayalong a path • dynamics of the latch S/H window • If erroneous signal is latched -->Soft Fault MAPLD Conference 2000

  41. Combinational LogicSE Effects in Synchronous Circuitry -- Errors Synchronous Module (System or Instance) Error Inputs Outputs Storage Element Error Soft Fault Single event charge deposition creates a transient noise pulse SE pulse competes with legitimate synchronous digital pulses Effect of the pulse depends on: • vulnerability of a node • active combinational logic paths • pulse shaping and propagation delayalong a path • dynamics of the latch S/H window • If erroneous signal is latched -->Soft Fault • If soft fault corruptsan output --> Error MAPLD Conference 2000

  42. AM2901 Bit-SliceFunctional Block Diagram CK I (2-0) D(3-0) I (8-6) I (8-6) S(3-0) ACCU MUXE R(3-0) MUX OUT Y(3-0) Rb(3-0) I (5-3) ALU RAM ALU_OUT(3-0) Ra(3-0) I (8-7) CK I (8-0): Instruction Code D(3-0): Input Data CK: Clock Y(3-0): Output MAPLD Conference 2000

  43. AM2901 Bit-SliceAdder-Accumulator Module CK D(3-0) I (8-6) I (8-6) S(3-0) ACCU MUXE R(3-0) MUX OUT Y(3-0) Rb(3-0) I (5-3) ALU RAM ALU_OUT(3-0) Ra(3-0) I (8-7) CK MAPLD Conference 2000

  44. Static Analysis of Vulnerable Nodes Adder/Accumulator (ACCU): Vulnerable Nodes Qcoll = 0.17pC Static analysis of nodes: • charge collection • loading capacitance • pull-up/pull-down drive MAPLD Conference 2000

  45. Static Analysis of Vulnerable Nodes Adder/Accumulator (ACCU): Vulnerable Nodes Qcoll = 0.18pC MAPLD Conference 2000

  46. Static Analysis of Vulnerable Nodes Adder/Accumulator (ACCU): Vulnerable Nodes Qcoll = 0.23pC MAPLD Conference 2000

  47. ResultsVulnerable Nodes Contributed by Each Logic Block MAPLD Conference 2000

  48. AM2901 Bit-SliceFunctional Block Diagram CK I (2-0) D(3-0) I (8-6) I (8-6) S(3-0) ACCU MUXE R(3-0) MUX OUT Y(3-0) Rb(3-0) I (5-3) ALU RAM ALU_OUT(3-0) Ra(3-0) I (8-7) CK MAPLD Conference 2000

  49. AM2901 Bit-SliceArithmetic Logic Unit Module CK I (2-0) D(3-0) I (8-6) I (8-6) S(3-0) ACCU MUXE R(3-0) MUX OUT Y(3-0) Rb(3-0) I (5-3) ALU RAM Ra(3-0) CK MAPLD Conference 2000

  50. Example of Active Path ALU Logic Block Single Event Strike 1 1 0 0 0 0 1 0 1 0 0 0 MAPLD Conference 2000

More Related