230 likes | 407 Views
An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs. Lee W. Lerner and Charles E. Stroud based on presentation at International Conf. on Embedded Systems & Applications, June 2006. Outline of Presentation. Motivation and Background Overview of Fail-Silent operation
E N D
An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs Lee W. Lerner and Charles E. Stroud based on presentation at International Conf. on Embedded Systems & Applications, June 2006
Outline of Presentation • Motivation and Background • Overview of Fail-Silent operation • Single Event Upsets (SEUs) • Fail-Silent Architecture • Fault isolation with Guard Bands • Experimental Implementations • Atmel AT94K series SoC • Xilinx Virtex-4 series FPGAs • Triple Modular Redundancy (TMR) • Summary • Future Work VLSI Design & Test Seminar Series, Fall 2006
Motivation and Background • Fail-Silent operation • Halt all operation immediately upon occurrence of a fault • Reduces need for periodic off-line system testing • Single Event Upsets (SEUs) • Transient or soft radiation-induced errors in microelectronic devices • Known to occur in high-radiation environments such as space • Affect FPGA configuration memory VLSI Design & Test Seminar Series, Fall 2006
Single Event Upsets (SEUs) • Energetic particles causing SEUs • Galactic cosmic rays • Cosmic solar particles influenced by solar flares • Trapped protons in radiation belts VLSI Design & Test Seminar Series, Fall 2006
V DD Vout Vin Radiation (proton, ion, neutron, …) V SS CMOS Inverter + - - - + + Upsetoccursif + + - channel current turned on - + + Latchup occurs if parasitic current loop initiated Single Event Upsets (SEUs) • SEU effects on CMOS technology • Change logic values of transistors V IN Gate V V V DD SS OUT Source Drain Source p+ n+ n+ p+ p+ n+ n - well p - type substrate CMOS Inverter Modified from Tribble, A. C., The Space Environment – Implications for Spacecraft Design, 2nd Ed., (Princeton, NJ: Princeton University Press, 2003). VLSI Design & Test Seminar Series, Fall 2006
Coupled Inverters BIT BIT Wire A Wire B word Configuration Memory Bit Programmable Interconnect Point (PIP) RAM Cell Module 1 Deactivated PIP Module 2 isolated wire segments PIP Connecting the Routing of Multiple Modules Traditional TMR Approach SEU Effects on an FPGA
input set #1 input set #2 Working Region #2 Working Region #1 fail-silent output set #1 fail-silent output set #2 Fail-Silent Architecture • Guard band region of isolation • Isolate multiple working circuits • No single fault can allow interaction between two working circuits guard band with fault monitor circuit VLSI Design & Test Seminar Series, Fall 2006
output from region #1 PLB PLBs for fault isolation to processor interrupt fail-silent output Fail-Silent Architecture • Fault monitoring circuit • For each output of independent working regions • Pair-wise compare outputs of working regions • Tri-state output when any mismatch occurs • Initiate processor routine to reconfigure FPGA output from region #2 guard band with fault monitor circuit tri-state buffer VLSI Design & Test Seminar Series, Fall 2006
Atmel AT94K Series Configurable SoC Architecture AT94K SoC Architecture Our AT94K Demo & Development Board VLSI Design & Test Seminar Series, Fall 2006
= Programmable Interconnect Point (PIP) ×4 lines ×8 lines repeaters PLB X 4 PLBs 8 PLBs Horizontal Repeaters in Global Routing Local Routing Global Routing (1 PLB) X X Y Y Y Y X Atmel AT94K Routing Architecture local cross-point PIPs repeaters express x8 local x4 express x8 guard band PLBs VLSI Design & Test Seminar Series, Fall 2006
System Function Fault Monitor Guard Band Implementation in AT94K • 80-bit LFSR system functions • 4 PLB wide guard band region • Fault monitor circuit in guard band region VLSI Design & Test Seminar Series, Fall 2006
System Function Fault Monitor Guard Band Implementation in AT94K VLSI Design & Test Seminar Series, Fall 2006
= DSPs (32-512) = PowerPCs (0-2) = block RAMs (36 – 552) = PLBs (1,368 – 22,272) Basic Virtex-4 Architecture • PIPs and Routing resources • 4 types of PIPs • Double lines (x2 lines) span 2 PLBs • Hex lines (x6 lines) span 6 PLBs • Long lines span width and length of PLB array Horizontal guard bands work best with Virtex-4 architecture VLSI Design & Test Seminar Series, Fall 2006
PLB w/ 4 slices Guard Band System Function Fault Monitor System Function I/O buffer I/O buffer Guard Band System Function Fault Monitor System Function Guard Band Implementation in Virtex-4 • Xilinx ISE: constraints in PACE and routing in FPGA Editor • Two 5-bit LFSR system functions • 6 PLB wide guard band region with fault monitoring circuit VLSI Design & Test Seminar Series, Fall 2006
PLB w/ 4 slices System Function System Function Guard Band I/O buffer Fault Monitor System Function System Function Guard Band Fault Monitor I/O buffer 74-bit LFSR Implementation VLSI Design & Test Seminar Series, Fall 2006
Module 1 Module 3 Guard Bands Module 1 Module 2 Module 3 Module 2 isolated wire segments Majority Voter Majority Voter Triple Modular Redundancy (TMR) Implementations in FPGAs • Traditional TMR SEU susceptibility problem • Wire segments from a PIP can access multiple modules • Therefore, 1 fault can destroy fault-tolerance • Special place and route algorithms needed to avoid problem • TMR fault isolation with guard band regions • Guard bands isolate module components and routing Deactivated PIP VLSI Design & Test Seminar Series, Fall 2006
System Function A System Function C Mixed Routing of 3 Different System Functions System Function B Traditional TMR Implementation in AT94K VLSI Design & Test Seminar Series, Fall 2006
TMR Implementation in AT94K System Function A System Function C System Function B Majority Voter Circuit VLSI Design & Test Seminar Series, Fall 2006
Fault Injection Results • TMR - Pass 1: • No fault injection • Majority Voter Passes AVR Fault Injection • TMR - Pass 2: • Module 1 injected with fault • Majority Voter Passes Guard Bands Module 1 Module 3 Module 1 Module 1 Module 2 Module 2 Module 3 Module 3 Majority Voter Majority Voter Majority Voter • TMR - Pass 3: • Modules 1 & 3 injected with faults • Majority Voter Fails √ √ × VLSI Design & Test Seminar Series, Fall 2006
local cross-point PIPs repeaters express x8 local x4 express x8 guard band PLBs Fault Injection Results • Guard Band: • Injected 240 faults at edge of guard band with no failure • Multiple specific faults required to cause failure VLSI Design & Test Seminar Series, Fall 2006
Summary • Guard Band regions for FPGAs • Isolate multiple working regions that contain functionally equivalent system functions • Fault monitoring circuits within guard bands • Monitor and compare working region outputs • Tri-state outputs when a mismatch occurs • Fail-Silent operation • Halt operation immediately upon occurrence of a fault • Area overhead only 2x that of non-fault-tolerant circuit • Use with TMR to achieve fault-tolerance • Single Event Upsets (SEUs) • Architecture provides immediate indication to initiate scrubbing of the configuration memory VLSI Design & Test Seminar Series, Fall 2006