450 likes | 480 Views
Single Event Upset Detection in Field Programmable Gate Arrays. Shadab Ambat Masters Thesis Defense Department of Electrical and Computer Engineering University of Kentucky Lexington, KY Feb 1, 2008. Overview. Introduction System Overview Radiation Detectors Single Event Upsets (SEUs)
E N D
Single Event Upset Detection in Field Programmable Gate Arrays Shadab AmbatMasters Thesis DefenseDepartment of Electrical and Computer EngineeringUniversity of KentuckyLexington, KYFeb 1, 2008
Overview • Introduction • System Overview • Radiation Detectors • Single Event Upsets (SEUs) • JBits • Test Setup • Conclusion
Thesis Motivation • High-radiation space environment can lead to unexpected abnormalities in spacecraft component behavior • The Single Event Upset (SEU) – a major effect when dealing with electronics particularly FPGAs • Radiation-hardened solutions not without their shortcomings • In order to use commercial-off-the-shelf (COTS) chips their susceptibility to SEUs needs to be studied in detail
Field Programmable Gate Arrays (FPGAs) • Semiconductor devices that can be user-programmed to perform specific logic functions • 2 key elements – Configurable Logic Block (CLB) and a Switch Matrix • CLBs typically made of storage elements, logic gates, look-up-tables etc. • Many FPGAs have advanced structures such as embedded processors • Configuration memory – SRAM, flash or antifuse • Re-programmability feature useful for space applications
The Virtex II • Dedicated 18-bit x 18-bit multiplier blocks • Distributed and Block memory resources • DCMs for clock management • Each CLB made up of 4 identical slices • The XC2V1000 has • Total Slices: 5,120 • CLBs arranged as an array 40 (r) x 32 (c)
SEU Problem • SEU – essentially a change in state occurring when a high-energy particle collides against a sensitive node of a micro-electronic device • Inherently soft errors and non-destructive • Can however trigger events like latch-ups • Can alter design in case of FPGAs • Rad-hardened versions designed for aerospace applications • Increased protection but device not completely SEU immune • Technologically inferior to contemporary commercial versions
Comparison between Rad-Hardened and Commercial FPGAs Notes: • BG – Standard Ball Grid Array (BGA) • FF – Flip-Chip Fine-Pitch BGA • DCM – Digital Clock Manager • DLL – Delay-Locked Loop • Prices from NuHorizons™ and Avnet™
Objective • Idea is to devise a satellite payload consisting of a COTS FPGA and determine its behavior from an SEU perspective • Section 1: Radiation Detector Module • Select a suitable sensor • Section 2: SEU Module • Detection scheme for monitoring SEU events • Removal through mitigation techniques • Implementation realizable for an embedded platform • Test SEUs on a commercial (Virtex II) FPGA in a software-simulated fault environment
Overview • Introduction • System Overview • Space Radiation Environment • Radiation Detectors • Single Event Upsets (SEUs) • Inducing Factors • Effects on Field Programmable Gate Arrays (FPGAs) • Mitigation Techniques • JBits • XHWIF Interface Implementation • Test Setup • Conclusion • Future Directions
CubeSat Standard • CubeSat – 1 kg, 10 cm cube-shaped satellite • Standard developed at Stanford University and Cal Poly • Double (2u) or triple (3u) configurations also possible • Deployed on the launch vehicle by means of a Poly Picosatellite Orbital Deployer (P-POD) • P-POD acts as an interface, ensures safety when integrating with primary payload • KySat a possible future option
System Overview • Detector Subsection • Radiation detector and supporting components/circuitry • SEU Subsection • Device-under-test will be a Virtex II • Additional PROM to hold the bitstreams • Detection Unit CPU • Radiation measure based on sensor output • Provide an SEU count • Have sufficient memory to run JBits • Flight Computer • System manager (main payload)
Overview • Introduction • System Overview • Space Radiation Environment • Radiation Detectors • Single Event Upsets (SEUs) • JBits • Test Setup • Conclusion
Space Radiation Environment • Low Earth Orbit (LEO): 80 – 2,000 km above the earth’s surface • Medium Earth Orbit (MEO): Between 2,000 to 35,786 km • Geosynchronous Orbit (GEO): Altitude of 35,786 km • Highly Elliptical Orbit (HEO): Low perigee (approx. 1,000 km) and high apogee (35,786 km)
Van Allen Radiation Belts • Two toroidal belts of trapped particles • Inner Belt • 1,000 km – 10,000 km • Predominantly protons • Discovered by Explorer I • Outer Belt • 15,000 km – 30, 000 km • Predominantly electrons • Discovered by Pioneer III • Responsible for phenomena such as the Aurorae Borealis Inner Belt Outer Belt • One region particularly intense – the South Atlantic Anomaly (SAA) • Altitude of approx. 500 km • Particle fluxes lie in range 0.04 MeV – 7 MeV (electrons) and 0.1 MeV – 400 MeV (protons) • Variable through natural (flares) or man-made events (Starfish)
Cosmic Rays • Energetic particles that originating from outside the earth’s atmosphere • Discovered in 1912 by Victor Hess • Consist of Galactic Cosmic Rays (GCRs), Anomalous Cosmic Rays and Solar Energetic Particles • GCRs • 90% (hydrogen) protons, 9% alpha particles (helium) and 1% ionized element nuclei • Most ions from lighter elements, heavier ones (Z>25) comparatively rare • Energies can go well into the GeV range • Supernova remnants believed to participate in energizing process • Highest energy single particle detected had an energy of 3x1020 eV (1991, Fly’s Eye cosmic ray detector) • Some degree of protection through Van Allen belts
Solar Wind • Plasma of electrons, protons and heavy ions originating from corona • Particles having enough thermal energy to escape Solar ‘pull’ • Approx:- • 95% – protons • 5% – helium + lower amounts of oxygen and other elements • Electrons to balance charge • Average speeds of about 400 km/s • Disturbances in particle density (solar events) can cause geomagnetic storms
Solar Events • Solar Flares • Immense explosions due to abrupt energy releases in magnetically active regions around sunspots • Discharge of protons, electrons and heavy ion particles • Increased occurrence during solar maximum • Coronal Mass Ejection (CME) • Ejections of large gas bubbles • Particle acceleration into millions of km/hr due to shock waves • Factor causing major storms on Earth • Usually (but not necessarily) associated with flares
Geiger-Müller (GM) Tube • Gas-filled detector • Used in Explorer missions • Window usually mica, glass • Anode potential 500 – 900 V • Robust, widely used, cost-effective • Cannot determine energy levels • Thin Wall (Hot Dog-Type) • β, γ detection • End-Window • Window allows α particle detection • Pancake Style • Better sensitivity • Bulky, heavy, fragile
Scintillator Detectors • Work on principle of ‘scintillation’ • Light emitted function of incident particle energy • Material transparent to its own radiation • Organic: Fast response, not very efficient • Liquids • ‘Cocktail’ – primary scintillator (solvent), a fluor, additional emulsifier • Plastics • Polymer base + fluor • Inexpensive, general-purpose, prone to degradation • Crystals • Anthracene has best efficiency • Inorganic: High efficiencies (high-Z), slow response • NaI or CsI with Tl ‘activators’ • NaI(Tl) has high efficiency (3x anthracene), but hygroscopic
Solid-State Detectors • Intrinsic Semiconductors • Semiconductor crystal with applied potential • Energy lost by particle generates charge carriers • ‘Drifting’ (Li) usually required • GaAs and CdZnTe can operate at room temperatures • Diode Arrangement • PIN diodes have excellent resolution • Can detect higher energies due to intrinsic region • Used in missions such as GIOVE-A, GeneSat-1 • Radiation Sensitive FET (RADFET) • Enhancement p-channel MOSFETs • Uses threshold voltage shifts to measure absorbed dose • Active (+ bias) or passive (zero bias)
Overview • Introduction • System Overview • Radiation Detectors • Single Event Upsets (SEUs) • Inducing Factors • Effects on Field Programmable Gate Arrays (FPGAs) • Mitigation Techniques • JBits • Test Setup • Conclusion
Single Event Upsets (SEUs) • Single Event Effect • ‘Any observable or measurable change in state or performance occurring in a microelectronic device, component or system that can be digital, analog or optical, resulting from a single energetic particle strike’ • Single Event Upset • ‘A soft error resulting from a transient signal induced by a single energetic particle strike ’ • Occurs when charge deposited exceeds Qcrit • Heavy ions – direct ionization, protons – elastic collisions, spallation • Also includes transients, functional interrupts, burnouts, latch-ups, etc. • Other abnormalities • Total Ionizing Dose (TID) Effects • Due to absorbed dose • Long-term • Spacecraft Charging • Buildup of charge • Structural damage
Inducing Factors • Trapped Particles • Principally high-energy protons from inner belt for LEOs • SAA of particular interest • GCR Particles • Major source especially heavy-ions • Highly penetrating, shielding ineffective • Solar Flares • Geomagnetic storms disrupt the earth’s magnetosphere • Facilitate GCR penetration • Atmospheric Neutrons • Ground-level, aircraft altitudes • Radioactive Materials • Radioactive impurities (uranium-238, thorium-232 ) during manufacture
Effects Pertaining to FPGAs • SRAM-based FPGAs particularly vulnerable • Routing Errors • Open • Bridge • Conflict • Resource-Based (CLB) Errors • LUT • MUX • Initialization
Effects Pertaining to FPGAs • IOB Faults • IOB settings altered • Can invert signal direction (low probability)
Mitigation Techniques • Technology-Based • Silicon-on-Insulator (SOI) • Complete device isolation • Resistance against SELs, SEUs, TID effects improved • Costly • Epitaxial CMOS Process • Limited latch-up protection • Design-Based • Spatial Redundancy (TMR) • Faulty leg ‘voted’ out • State-dependent logic (counters) need to be restored to correct state • Area penalty • Temporal Redundancy • Protections against transients • Speed penalty • Scrubbing • Detection and correction (at scrub rate) • Frame CRCs • Partial readback/reconfiguration
Overview • Introduction • System Overview • Radiation Detectors • Single Event Upsets (SEUs) • JBits • XHWIF Interface Implementation • Test Setup • Conclusion
JBits • Collection of Java classes for Xilinx devices • Allows access to low-level device resources (CLBs, routing) • Synthesis, bitstream generation/modification very fast • Communicates with board through Xilinx HardWare InterFace (XHWIF) • XHWIF needs to be created in order to be able to use JBits classes for the board
Joint Test Action Group (JTAG) • Utilizes 4 (+1) pins for communicating with/testing devices • TCK (Test Clock) • TMS (Test Mode Select) • TDI (Test Data In) • TDO (Test Data In) • TRST (Test Reset) • TAP controller used to load instructions/data, perform tests, configure etc. • 16-state finite state machine
XHWIF Implementation • Native methods implemented in C++ • Supplied as a DLL • Daisy-chain arrangement requires BYPASS of PROM • Connect() method • Verifies communication with board through a ‘sanity’ check • Acquires device IDs through CheckIdCode() and verifies
Overview • Introduction • System Overview • Radiation Detectors • Single Event Upsets (SEUs) • JBits • Test Setup • Conclusion
SEU Detection • Test SEUs in CLB flip-flops • Shift-register (SFR) slices kept large to maximize utilization • Fed with a ‘checkerboard’ pattern • Fault in any logic leg should be detected • Device Utilization • Slices: 99% • CLB flip-flops: 89%
Test Procedure • Phase I: Fault-Free • Invoke program with -cvr • Verify methods • Identify bugs • Phase II: Fault Injection • Load device with ‘golden’ bitstream • Identify resources/attributes directly affecting FFs • Use JBits methods to modify target CLB settings • Run-time reconfiguration with faulty bitstream • Mask file used in verification • ‘1’ – User memory, don’t compare • Compare readback and ‘golden’ bitstreams • Record number (and position) of upsets
Test Results * JBits did not modify any part of the bitstream when this attribute was changed. ** Depending on current state of flip-flop and position of the set (reset) pulse *** This attribute resides in user memory of the FPGA bitstream Notes: (i) Containing SFR output stuck-at-0 (ii) Containing SFR output stuck-at-1 (iii) Corresponding signal out of phase (iv) None (error effect most likely transient)
Overview • Introduction • System Overview • Radiation Detectors • Single Event Upsets (SEUs) • JBits • Test Setup • Future Directions • Conclusion
Future Directions • SEU testing might be needed in a particle accelerator facility • Analysis for other effects (latch-ups, TID) • Create test cases for other resources such as routing, RAM structures etc. • Run SFRs and pattern generators at different (lower) frequencies • Use partial reconfiguration/readback • Parallel port limits TCK to 260 KHz • Maximum frequency of 33 MHz possible • Create standalone C/C++ program if memory requirement of Java too high • XHWIF code can be reused
Conclusion • Explored different radiation detector solutions • Described SEUs and their effects from an FPGA perspective • Developed required XHWIF interface for JBits-based test approach • JBits successfully used to simulate and correct SEUs in CLB flip-flops