100 likes | 466 Views
Single Event Upsets (SEUs) Particularly in Field Programmable Gate Arrays (FPGAs). Shadab Ambat. Overview. Introduction SEU Effects Motivation SEUs in FPGAs SEUs in the Xilinx Virtex-II Pro SEU Mitigation Techniques Detection and Mitigation Tools. Introduction.
E N D
Single Event Upsets (SEUs) Particularly in Field Programmable Gate Arrays (FPGAs) Shadab Ambat
Overview • Introduction • SEU Effects • Motivation • SEUs in FPGAs • SEUs in the Xilinx Virtex-II Pro • SEU Mitigation Techniques • Detection and Mitigation Tools Shadab Ambat
Introduction • Single event upsets (SEUs) are radiation-induced errors in microelectronic circuits caused when charged particles (usually from the radiation belts or from cosmic rays) lose energy by ionizing the medium through which they pass, leaving behind a wake of electron-hole pairs. • In other words – they are changes in states or voltage levels caused when by high-energy particle striking sensitive nodes in a micro-electronic device • Typically observed in microprocessors, memory elements, FPGAs etc. • Normally occur in space or at high altitudes • Mainly caused by two types of radiation with the latter being the primary one • Alpha particle • High-energy neutrons Shadab Ambat
SEU Effects • They are basically soft errors, and non-destructive. • However can in several cases lead to potentially destructive errors like • Single Event Latchup (SEL): High operating current can destroy device • Single Event Functional Interrupts (SEFI): Device goes into a halt or undefined state and must be reset to recover • Single Event Burnouts (SEB): These are conditions that cause device destruction due to a high current state in a power transistor device Shadab Ambat
Motivation • To counter SEU effects, radiation hardened chips are used • Downsides to this are: • These chips are expensive • Consume more power • Can be as much as 10 times slower than their equivalent commercial counterparts • NASA researching on using commercial non-radiation hardened processors for several space applications instead • One technique to resolve problem of SEUs – use three times as many processors and vote on the result (triple modular redundancy). • Project is called Dependable Multiprocessor (DM), formerly known as Environmentally Adaptive Fault-Tolerant Computing (EAFTC) • This proposed solution will be flight-tested on the Space Technology 8 (ST-8) satellite part of NASA's New Millennium Program • ST-8 mission targeted to launch on 28 February 2009 Shadab Ambat
SEUs in FPGAs • FPGAs consist of static memory elements and a configuration array • Configuration array used to program the FPGA and specify its functionality • SEUs might either: • Alter logic (bitflip) in a memory element like a latch or RAM cell • Or cause a static upset in the configuration memory • Latter case might lead to a functional error in the FPGA and could cause adverse effects in its functionality Shadab Ambat
SEUs in the Xilinx Virtex-II Pro • Virtex-II Pro currently proposed for use in the DM module of the ST-8 • Complex System on a Chip (SoC) design • Main components include an IBM PowerPC, configuration memory and RAM blocks • A Test conducted by scientists from NASA Goddard Space Flight Center showed all the above to be affected by SEUs • Configuration memory and PowerPC had high susceptibility to radiation • Test conducted on 3 identical boards, each populated with a delidded Virtex-II Pro FPGA Device reprograms • No destructive SEL event observed to a Linear Energy Transfer of 53.9 MeV-cm2/mg and a fluency of 107 Ions/cm2 • SELs caused cyclical current ramping in device • 400,000+ configuration errors recorded during two short runs • Jumps in the PowerPC instruction set • PowerPC reset itself twice • Lost JTAG capability twice during SEL testing • Overall result to reestablish functionality – Reprogram: 70%, Software Reset: 28%, Power Cycle: 2% Shadab Ambat
SEU Mitigation Techniques • Partial Reconfiguration • Readback current configuration • Do a bit by bit comparison (slower) or a CRC check (faster) • Re-configure affected bits • Scrubbing • Omit readback or detection • Reload entire bitstream at regular (scrubbing) intervals • TMR • Implement TMR on same chip. Triplicate all logic blocks on same chip • Use 3 chips to incorporate TMR Shadab Ambat
Detection and Mitigation Tools • Xilinx TMRTool – XTMR • Automatically builds TMR methodology into Xilinx FPGA designs • Triplicates all inputs including clocks and throughput (combinational) logic • Inserts voters on all feedback paths resulting in state machines to remain synchronized. Removes need to reset to recover from SEU • Major difference from traditional TMR – voters themselves are triplicated • If SEU occurs in a voter domain it places output to high impedance • If SEU in voter itself, worst it does is disable the output of a domain that is behaving correctly. • ZeroSoft’s free online soft error detection tool • Requires the VHDL and VCD files • Gives a free online evaluation • Currently works only with Xilinx cell libraries Shadab Ambat