290 likes | 639 Views
Radiation Effects on FPGA and Mitigation Strategies. Bin Gui Experimental High Energy Physics Group. Outline. General introduction of FPGA Radiation Effects S ingle E vent U pset S ingle E vent F unctional I nterrupt Mitigation Measurements of SEU and SEFI Experiment Set-up
E N D
Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group Journal Club
Outline • General introduction of FPGA • Radiation Effects • Single Event Upset • Single Event Functional Interrupt • Mitigation • Measurements of SEU and SEFI • Experiment Set-up • Heavy Ion Test Results • Proton Test Results • Summary Journal Club
Field Programmable Gate Array • The field programmable gate array (FPGA) is a semiconductor device that can be programmed after manufacturing. Instead of being restricted to any predetermined hardware function, an FPGA allows you to program product features and functions, adapt to new standards, and reconfigure hardware for specific applications even after the product has been installed in the field—hence the name "field-programmable". • These devices have always offered significant advantages in flexibility, and recent advances in fabrication have greatly increased logic capacity, substantially increasing the number of applications for this technology. • FPGAs have been an attractive choice in small volume instrumentation and control system electronics. Journal Club
The Uses of FPGA Sorry, no electricity Journal Club
FPGA Architecture • The core of the Xilinx vertex series FPGA consists of: • An array of configurable logic blocks (CLBs), each of which consists of two slices. Each slice contains two 4 input look up tables for logic generation, two flip flops, and arithmetic carry and clocking functions. • Flanking the CLB matrix are two columns of dual port RAM, divided into 4Kbit blocks. • The edges of the device are populated by input/output blocks, which support several I/O standards. • This FPGA is based on SRAM technology can be reconfigured at will, allowing unmatched flexibility in the face of changing requirements. • Unfortunately, the increased density (and corresponding shrinkage of process geometry), has made these devices more susceptible to failure due to external radiation. Journal Club
Single Event Effects • Single event effects (SEE) - Single Event Effects refer to the fact that it is not a cumulative effect but an effect related to single individual interactions in the silicon. Highly ionizing particles can directly deposit enough charge locally in the silicon to disturb the function of electronic circuits. • Single event upset (SEU): The deposited charge is sufficient to flip the value of a digital signal. Single Event Upsets normally refer to bit flips in memory circuits (RAM, Latch, and flip-flop) but may also in some rare cases directly affect digital signals in logic circuits. This is usually reversible. • Single event latchup (SEL): Latched change of state of a circuit due to radiation. May need to power cycle to reset. • Single event burnout (SEB): Single event burnout refers to destructive failures of power MOSFET transistors in high power applications. • Single event functional interrupt (SEFI): Typically, SEFIs are low in occurrence and are almost never seen while in orbit. However, in test environments where event rates are hugely accelerated in order to obtain statistical significance and accurate measurements of events even with negligible cross-sections, SEFIs may be observed. The criterion for a SEFI is that it requires either a complete reconfiguration or power-cycle of the device before returning to normal operation. Journal Club
Mitigation • Mitigation involves both repairing altered configuration and logic design that is resistant to failure. • Scrubbing refers to the periodic readback of the FPGA’s configuration memory, comparing it to a known good copy, and writing back any corrections required. By periodically scrubbing a device, maximum limits may be placed on the period of time that a configuration error can be present in a device. • Triple Module Redundancy, (the most widely used technique) is an effective technique creating fault tolerant logic. Journal Club
Triple Module Redundancy • In TMR, the logic of the design can simply be triplicated, with redundant voters on the output. In order to recover smoothly from logic upsets, the internal state of the design must be restored to the repaired logic. • In the feedback counter, the state of the counters is obtained from the output of the voters. This feature has the effect of always presenting the correct state to the counter logic, resulting in the logic being self restoring in the event of an upset and subsequent repair. • TMR does not come without a price. Obviously, designs are at least 3 times as large as a non TMR design, and suffer from speed degradation as well. In particular, feedback TMR degrades the speed of operation by introducing a longer feedback path including the voter. Power consumption is also tripled along with the logic. • The underlying assumption of TMR is that only one upset will occur within a given logic block. This is not always a good assumption to make. Recent testing resulted in approximately .3-.5% of upsets causing multiple bit upsets within the device. TMR counter Feedback counter with TMR in the feedback path Journal Club
Measurements of SEU and SEFI • Experiment Set-up • Heavy Ion Test Results • Proton Test Results Journal Club
FPGAs Used in Measurement Journal Club
Test Setup in Vacuum The setup for in air testing was essentially the same as in vacuum, the main exception being that the adapted connections for getting through the bulkheads were discarded. Also, USB programming cables were used via high speed hubs for the in-air irradiations. Journal Club
Latchup Testing – DUT FPGA • DUT – device under test • For the purpose of this experiment, the accepted definition of a latchup was any sudden high current modes resulting from the test run that required a power cycle of the DUT in order to recover. • Because the bottom of the silicon is solder “bumped” to a fully populated ball-grid package, it is difficult to heat the device enough for latchup testing with an external heating element. In order to obtain the target temperature (near 125°C junction temperature) in vacuum, the devices were configured with a “heater” (a long shift-register chain of CLB flip-flops) design meant to increase dynamic current consumption sufficient to heat the transistor junctions to a desired temperature. Journal Club
Latchup Testing – Results Journal Club
Heavy-Ion Test • The devices were tested at different incidences for an LET (linear energy transfer) range of 1.2–108.7 MeVcm²/mg. A combination of degraders and angles were used to achieve higher LET using the same ion. (How?) Journal Club
SEU Results • The data graphs shown in this report all have two sigma statistical error bars plotted. • The static heavy ion SEU response data set has been fit with a Weibull curve function to facilitate Orbital Rate Calculations. The equation below shows this function: • The absolute LET threshold extrapolates to about 1 MeV-cm²/mg (or lower) for both the configuration memory and the block memory. Journal Club
Single Event Functional Interrupt • Power-On-Reset (POR) SEFI results in a global reset of all internal storage cells and the loss of all program and state data. • SelectMAP (SMAP) SEFI is the loss of either read or write capabilities through the SelectMAP port. • Frame Address Register (FAR) SEFI results in the frame address register continuously incrementing uncontrollably. • Global Signal SEFI is separated from other design-disrupting SEFIs for the first time in these tests. These signals include GSR (Global Set/Reset), GWE_B (Global Write Enable), GHIGH_B (Global Drive High), and others. They can all be observed through the status (STAT) register or the control (CTL) register. • Readback SEFI occurs when a portion of the readback data has been upset and cannot be corrected. • Scrub SEFI seems to be the result of an upset causing corruption of the data stream being scrubbed into the DUT. Journal Club
SEFI Results Journal Club
Proton Test Results - SEU Journal Club
Proton Test Results - SEFI Journal Club
CREME96 Calculated Orbital Upset Rates Journal Club
CREME96 Calculated Orbital Upset Rates Journal Club
CREME96 Calculated Orbital Upset Rates Journal Club
CREME96 Calculated Orbital Upset Rates Journal Club
Summary • The SEFI cross sections are low enough to be almost academic. • The space upset rates given in Table 9 are sufficiently low. • Further study on orbital rate calculation. • Considering our experiment (Actel). Journal Club
Reference • Radiation effects and mitigation strategies for modern FPGAs • http://parts.jpl.nasa.gov/docs/NEPP07 Journal Club