210 likes | 305 Views
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection. Joel Seely Technical Marketing Manager Military & Aerospace Business Unit. Single Event Upset (SEU) Overview for SRAM-Based FPGAs. Definitions. SEU: Single Event Upset
E N D
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel SeelyTechnical Marketing ManagerMilitary & Aerospace Business Unit
Definitions • SEU: Single Event Upset • Unwanted Change in State of a Latch or a Memory Cell • SER: Soft Error Rate • SEU Rate • SEFI: Single Event Functional Interrupt • Functional Failure by SEU • Not All SEUs are SEFIs • Generally Takes 5-10 SEUs to Cause SEFI
Circuit Components of SRAM-Based FPGAs • I/O Registers & I/O Configuration • No Issue, Very Robust Registers, < 1 FIT • Logic Registers (LEs) • No Issues, Very Robust Registers, < Hard Error Rate • User Memory • Typically On-Chip Memories are “By 9” for Parity Checking • IP Available for ECC • Configuration RAM (CRAM) for LUTs & Routing • Area of Focus
Upset of a CRAM Cell Voltage Voltage Time Noise Current for 10fC Collected Charge Vcc 200 Add Time 150 Data In Data Out Current (µA) 100 Clear 50 0 0 50 100 150 200 Vss Time (ps) 6 Transistor Cell
SEU Induced Failure Rate* * Data at Sea Level **MTBF: Mean Time Between Functional Interrupt
Number of CRAM Bit Upsets for Each Occurrence of Functional Upset Median ~6 Median 5
SER Improvements/Mitigation • Chip Design Enhancements • New Materials & Process Enhancements • Larger CRAM Structure • Increase in Capacitance on Critical Node • Smaller Process => Smaller Die => Lower SEU Probability • Built-In Error Detection/Correction Circuitry
SER Per SRAM Bit Trend 1,000 FITS SER per SRAM MBit 90 nm Projection 100 FITS Process Technology Year 0.5 µm 1995 0.13 µm 2002
System Level Improvements Mitigation • ECC for User Memory • Use Detection/Correction Feature • Triple Module Redundancy (TMR) • To Achieve Lower Error Rate & Less Downtime • Migrate to Structured ASIC
Soft Error Detection Methods • Configuration RAM Readout • Read-Out Full Bitstream • Compare with Stored Bitstream • Can Determine where in Configuration Error Occurred Caveat: Security Issues with Reading Out Bitstream Stored CRAM Data FPGA Microprocessor or CPLD Same or Different?
FPGA Stored Value = Computed Value To Core Soft Error Detection Methods • On-Chip SEU Detection • Dedicated Comparison Circuitry • e.g. CRC Engine Comparing Stored CRC with That Calculated from Configuration RAM • Detection Circuitry Running Continuously • Error Detection Rate Variable Based on Implementation of Hardware, Number of CRAM Bits & Input Clock Frequency • Error Signal Available Internally or Externally Caveat: Cannot Determine Where in Configuration Error Occurred
On-Chip Detection Example • Dedicated CRC Circuit • Configuration RAM Verification Capability • 32-Bit Cyclic Redundancy Code Check • Verified Against Internally Stored Value • Runs in the Background Without Impacting Device Performance • Close to Real-Time Detection • Variable Clock Frequency • Depends on Number of CRAM Bits • Multi-Event Detection • Up to 3-Bit for 32-Bit CRC • Result Output to Either Core or Pin • Use with Either Internal or External Hardware for Error Correction
Correction Methods • FPGA Detection, System-Level Correction • Lower Total Cost • Downtime Is Limited & Manageable • Used in Non-Critical Applications • Triple Module Redundancy • Two Flavors • All On-Chip in FPGA • Separate Chips & Voter • Correction Can Be Real-Time • Used in Critical Applications
Single System Detection & Correction • Step One: Detect the Soft Error • 75% of Reported Errors Are “Don’t Care” Errors • Step Two: Alert the System • Step Three: Fix the Error • In Some Cases, Re-Program the FPGA • In Some Cases, Reboot the Sub-System • In Some Cases, Reboot the System • Need to Focus on System “Downtime” • Each System Has Unique Requirements • Re-Programming FPGA Takes < 250 ms • Rebooting Time Varies & Can Be Fast “by Design”
FPGA Hardware1 FPGA Hardware 2 FPGA or CPLD (Voting) FPGA Hardware3 TMR Method 1 • Identical Hardware in FPGAs • Use Voter Implemented in FPGA or CPLD • Utilize Either Hardware Output or CRC Error Pin • Voter Also Used to Signal Reconfiguration on Difference or Error
Hardware 1 Hardware 2 Voting Circuit Hardware 3 FPGA TMR Method 2 • Multiple Instantiations of Hardware in Single FPGA • For Low-Rate SEUs • SEU Events May Occur Much More Frequently than Functional Error (De-Rating) • Voter Signals Reconfiguration of FPGA • FPGA Must be Reconfigured
De-Rating Methodology • Only a Fraction of Configuration Bits Are Actually Programmed • e.g. Using Only Two Inputs of 4-Input LUT Leaves 75% of LUT as “Don’t Care” • Only About 20% of Routing Is Used • Depends on Utilization & Application • Some Un-Programmed Bits Still Matter • Flipping Could Change Function of the Device • Extensive Experimentation Shows a Range From 1/8 to 1/3 of the Bits Matter
Structured ASIC: Ultimate SEU Protection PLD Architecture with ASIC Routing FPGA Structured ASIC No Configuration Memory = Estimated SER is below Hard Failure Rate for the Device
Summary • SEU is a Well Understood Phenomena • Many Chip Level Enhancements Mitigate SEUs • Process • Design • Manufacturing Techniques • Easy Detection of SEU Events is Key • After Detection, Other Methods Must be Employed to Deal with the Event • Critical Nature of Application Determines Level of SEU Response • Structured ASICs from FPGA Designs Offer a Much More Robust Solution Due to Removal of All CRAM