1 / 21

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs. Lee W. Lerner and Charles E. Stroud based on presentation at International Conf. on Embedded Systems & Applications, June 2006. Outline of Presentation. Motivation and Background Overview of Fail-Silent operation

ania
Download Presentation

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs Lee W. Lerner and Charles E. Stroud based on presentation at International Conf. on Embedded Systems & Applications, June 2006

  2. Outline of Presentation • Motivation and Background • Overview of Fail-Silent operation • Single Event Upsets (SEUs) • Fail-Silent Architecture • Fault isolation with Guard Bands • Experimental Implementations • Atmel AT94K series SoC • Xilinx Virtex-4 series FPGAs • Triple Modular Redundancy (TMR) • Summary • Future Work VLSI Design & Test Seminar Series, Fall 2006

  3. Motivation and Background • Fail-Silent operation • Halt all operation immediately upon occurrence of a fault • Reduces need for periodic off-line system testing • Single Event Upsets (SEUs) • Transient or soft radiation-induced errors in microelectronic devices • Known to occur in high-radiation environments such as space • Affect FPGA configuration memory VLSI Design & Test Seminar Series, Fall 2006

  4. Single Event Upsets (SEUs) • Energetic particles causing SEUs • Galactic cosmic rays • Cosmic solar particles influenced by solar flares • Trapped protons in radiation belts VLSI Design & Test Seminar Series, Fall 2006

  5. V DD Vout Vin Radiation (proton, ion, neutron, …) V SS CMOS Inverter + - - - + + Upsetoccursif + + - channel current turned on - + + Latchup occurs if parasitic current loop initiated Single Event Upsets (SEUs) • SEU effects on CMOS technology • Change logic values of transistors V IN Gate V V V DD SS OUT Source Drain Source p+ n+ n+ p+ p+ n+ n - well p - type substrate CMOS Inverter Modified from Tribble, A. C., The Space Environment – Implications for Spacecraft Design, 2nd Ed., (Princeton, NJ: Princeton University Press, 2003). VLSI Design & Test Seminar Series, Fall 2006

  6. Coupled Inverters BIT BIT Wire A Wire B word Configuration Memory Bit Programmable Interconnect Point (PIP) RAM Cell Module 1 Deactivated PIP Module 2 isolated wire segments PIP Connecting the Routing of Multiple Modules Traditional TMR Approach SEU Effects on an FPGA

  7. input set #1 input set #2 Working Region #2 Working Region #1 fail-silent output set #1 fail-silent output set #2 Fail-Silent Architecture • Guard band region of isolation • Isolate multiple working circuits • No single fault can allow interaction between two working circuits guard band with fault monitor circuit VLSI Design & Test Seminar Series, Fall 2006

  8. output from region #1 PLB PLBs for fault isolation to processor interrupt fail-silent output Fail-Silent Architecture • Fault monitoring circuit • For each output of independent working regions • Pair-wise compare outputs of working regions • Tri-state output when any mismatch occurs • Initiate processor routine to reconfigure FPGA output from region #2 guard band with fault monitor circuit tri-state buffer VLSI Design & Test Seminar Series, Fall 2006

  9. Atmel AT94K Series Configurable SoC Architecture AT94K SoC Architecture Our AT94K Demo & Development Board VLSI Design & Test Seminar Series, Fall 2006

  10. = Programmable Interconnect Point (PIP) ×4 lines ×8 lines repeaters PLB X 4 PLBs 8 PLBs Horizontal Repeaters in Global Routing Local Routing Global Routing (1 PLB) X X Y Y Y Y X Atmel AT94K Routing Architecture local cross-point PIPs repeaters express x8 local x4 express x8 guard band PLBs VLSI Design & Test Seminar Series, Fall 2006

  11. System Function Fault Monitor Guard Band Implementation in AT94K • 80-bit LFSR system functions • 4 PLB wide guard band region • Fault monitor circuit in guard band region VLSI Design & Test Seminar Series, Fall 2006

  12. System Function Fault Monitor Guard Band Implementation in AT94K VLSI Design & Test Seminar Series, Fall 2006

  13. = DSPs (32-512) = PowerPCs (0-2) = block RAMs (36 – 552) = PLBs (1,368 – 22,272) Basic Virtex-4 Architecture • PIPs and Routing resources • 4 types of PIPs • Double lines (x2 lines) span 2 PLBs • Hex lines (x6 lines) span 6 PLBs • Long lines span width and length of PLB array Horizontal guard bands work best with Virtex-4 architecture VLSI Design & Test Seminar Series, Fall 2006

  14. PLB w/ 4 slices Guard Band System Function Fault Monitor System Function I/O buffer I/O buffer Guard Band System Function Fault Monitor System Function Guard Band Implementation in Virtex-4 • Xilinx ISE: constraints in PACE and routing in FPGA Editor • Two 5-bit LFSR system functions • 6 PLB wide guard band region with fault monitoring circuit VLSI Design & Test Seminar Series, Fall 2006

  15. PLB w/ 4 slices System Function System Function Guard Band I/O buffer Fault Monitor System Function System Function Guard Band Fault Monitor I/O buffer 74-bit LFSR Implementation VLSI Design & Test Seminar Series, Fall 2006

  16. Module 1 Module 3 Guard Bands Module 1 Module 2 Module 3 Module 2 isolated wire segments Majority Voter Majority Voter Triple Modular Redundancy (TMR) Implementations in FPGAs • Traditional TMR SEU susceptibility problem • Wire segments from a PIP can access multiple modules • Therefore, 1 fault can destroy fault-tolerance • Special place and route algorithms needed to avoid problem • TMR fault isolation with guard band regions • Guard bands isolate module components and routing Deactivated PIP VLSI Design & Test Seminar Series, Fall 2006

  17. System Function A System Function C Mixed Routing of 3 Different System Functions System Function B Traditional TMR Implementation in AT94K VLSI Design & Test Seminar Series, Fall 2006

  18. TMR Implementation in AT94K System Function A System Function C System Function B Majority Voter Circuit VLSI Design & Test Seminar Series, Fall 2006

  19. Fault Injection Results • TMR - Pass 1: • No fault injection • Majority Voter Passes AVR Fault Injection • TMR - Pass 2: • Module 1 injected with fault • Majority Voter Passes Guard Bands Module 1 Module 3 Module 1 Module 1 Module 2 Module 2 Module 3 Module 3 Majority Voter Majority Voter Majority Voter • TMR - Pass 3: • Modules 1 & 3 injected with faults • Majority Voter Fails √ √ × VLSI Design & Test Seminar Series, Fall 2006

  20. local cross-point PIPs repeaters express x8 local x4 express x8 guard band PLBs Fault Injection Results • Guard Band: • Injected 240 faults at edge of guard band with no failure • Multiple specific faults required to cause failure VLSI Design & Test Seminar Series, Fall 2006

  21. Summary • Guard Band regions for FPGAs • Isolate multiple working regions that contain functionally equivalent system functions • Fault monitoring circuits within guard bands • Monitor and compare working region outputs • Tri-state outputs when a mismatch occurs • Fail-Silent operation • Halt operation immediately upon occurrence of a fault • Area overhead only 2x that of non-fault-tolerant circuit • Use with TMR to achieve fault-tolerance • Single Event Upsets (SEUs) • Architecture provides immediate indication to initiate scrubbing of the configuration memory VLSI Design & Test Seminar Series, Fall 2006

More Related