220 likes | 514 Views
SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA. Mandy M. Wang JPL R&TD Mobility Avionics. Agenda. Project Background SEU Sensitive Areas and Mitigation Approaches Design Details Conclusion. Project Objective.
E N D
SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA Mandy M. Wang JPL R&TD Mobility Avionics D/MAPLD 2004
Agenda Project Background SEU Sensitive Areas and Mitigation Approaches Design Details Conclusion D/MAPLD 2004
Project Objective Mobility Avionics project aims to develop an embedded platform for space flight instruments and systems that is scalable, configurable, and capable of withstanding low to medium radiation environments. D/MAPLD 2004
Multi-Tiered Strategy Science Data Processor Orbiter Command Data Handler Not Time Critical Image Processor Micro-Mobility Controller Motor Control Simple Strategy Robust Strategy Science Data Processor Time Critical EDL Controller Always Available Strategy Ground Support Equipment Mission Critical Not Mission Critical D/MAPLD 2004 Low to Medium Radiation Tolerance is Assumed
Strategies Simple Strategy:A quick-and-dirty approach. It uses less than desirable techniques such as device reset and reconfiguration as a means of error correction. It may require an external computer for configuration check. Robust Strategy:A refinement of the simple strategy. It uses a SEU immune FPGA as a monitoring device for the system board base on Xilinx FPGA device. As a result, no external computer is needed. D/MAPLD 2004
SEU Sensitive Areas • Xilinx Virtex-II Pro SEU sensitive areas include: • PPC405 Core registers • Configuration Memory (LUT equation and Routing) • Data path Registers • User Memory • (Block or Distributed RAMs) (XC2VP20) Normalized Data – based on predicted upset rates D/MAPLD 2004
Mitigation Approaches D/MAPLD 2004
System Design - Overview Serial Port Decoder (Injects fault Signals) FI PPC405 1 PPC405 2 EXT MEM (128MB) EDC FI OCM BRAM (8K) PLB2OPB Bridge UARTs C DDR SDRAM Cntl FI FI FI FI EDC PLB ARB OPB ARB EDC Controller FI Status BRAMs (4K) PLB BRAMs (Firmware) (32K) Crit. INTC Non-Crit INTC (External Devices) D/MAPLD 2004
Dual-processor Comparator PPC 405 Block 1 PPC 405 Block 2 Off Chip Area Cache Units MMU CPU Timers and Debug Cache Units MMU CPU Timers and Debug External SDRAM PLB IPIF PLB IPIF FI FI FI FI FI FI C FI FI DDR SDRAM Controller PC Arbiter PLB Bus Note: Yellow lines: PLB master read / write signals for D-Cache Green Lines: PLB master read signals for I-Cache FI : Fault insertion point PC : Parity Check D/MAPLD 2004
Dual-Processor Voting Simulation D/MAPLD 2004
EDAC OCM BRAMs (Read/Write) • Hamming Code [32,39] • Read-modified-write to support byte enable feature • Error information is stored in a separate memory space • Single-bit error triggers a CPU interrupt • Double-bit error triggers a CPU reset Data Out (discard parity bits) 32 PPC405 #1 FORCE ERROR PARITY_OUT Glue Logic ENCIN Parity Encoder 32 7 32 ADDR BRAMS (8KB) EN ENOUT DECIN W_EN[3:0] Error Detection Correction 32 32 7 DECOUT CLK PARITY_IN PPC405 #2 ERROR D/MAPLD 2004 Xilinx XAPP645
EDAC PLB BRAMs (Read Only) • Hamming Code [64,72] • Read-modified-write to support byte enable feature • Single-bit error is stored in a separate memory space • Single-bit error triggers a CPU interrupt • Double-bit error triggers a device reconfiguration Data Out (discard parity bits) 64 FORCE ERROR 2 PLB Interface PARITY_OUT ENCIN Parity Encoder Glue Logic 64 8 ADDR 64 Processor Local Bus BRAMS (32KB + 8 KB) EN ENOUT W_EN DECIN PLB BRAM Controller Error Detection Correction 64 64 DECOUT 8 CLK PARITY_IN 2 ERROR D/MAPLD 2004 Xilinx XAPP645
EDAC DDR SDRAM • Hamming Code [64,72] • Read-modified-write to support byte enable and burst of 2-words features • Single error is stored in a separate memory space • Single error triggers a CPU interrupt • Double error triggers device reconfiguration Data Out (discard parity bits) 64 32 Mux FORCE ERROR 2 PARITY_OUT ENCIN 8 Glue Logic Parity Encoder 64 4 Mux DDR SDRAM (128MB + 32MB) PLB interface modules 64 32 Processor Local Bus ADDR ENOUT DECIN 32 DDR SDRAM Controller Error Detection Correction 64 64 Demux CLK 8 4 CLKn DECOUT PARITY_IN 2 ERROR D/MAPLD 2004 Xilinx XAPP645
Self Configuration Checker Digital Design ICAP Controller top.bit Implementation ICAP top.ll (contains frame address used for the design) Read Back Commands ( 44 Bytes) Frame Address Memory (BRAMS) C script 4 Bytes (BRAMS) Frame address data formatted for BRAMS CRC Checker Virtex-II Pro This portion can be ported to a radiation-hardened FPGA in the case of robust strategy D/MAPLD 2004
Self Configuration CheckerDesign Highlights • No External I/Os access required • Frame-by-frame read back required • 32-bit CRC algorithm implemented. (A CRC signature is generated after device power up) • No SRL16 and Distributed SelectRAMs used in design D/MAPLD 2004
Labview Fault Injection Panel Screenshot of fault injection emulator that interfaces with the prototype board. Process Bus Fault Injection Buttons Program counter resets to zero when a CPU reset occurs. ASCII Command Input window Fault Injection Error Counters Processors Mismatch LED Indicator Fault location map D/MAPLD 2004
XC2VP20 Device Utilization (without TMR) Number of External IOBs 57 out of 564 10% Number of PPC405s 2 out of 2 100% Number of RAMB16s 30 out of 88 34% Number of SLICEs 4334 out of 9280 46% Number of BUFGMUXs 6 out of 16 37% Number of DCMs 2 out of 8 25% Number of ICAPs 1 out of 1 100% Number of JTAGPPCs 1 out of 1 100% D/MAPLD 2004
Slice Utilization (without TMR) D/MAPLD 2004 Note: The shaded modules can be replaced by other approach.
Mitigation State Machine CPU Interrupt 1) OCM BRAM single-bit error 2) PLB BRAM single-bit error 3) DDR SDRAM single-bit error CPU Reset 1) CPU mismatch 2) CPU watchdog timer 3) OCM EDC double-bit error CPU reset counter == full Normal Mitigation Severity System Reset 1) OPB Bus error 2) PLB Bus error System reset counter == full FPGA Reconfiguration 1) Configuration check fail 2) PLB EDC double-bit error 3) DDR SDRAM double-bit error D/MAPLD 2004
Conclusion • Identified and categorized error prone regions on the Virtex-II Pro into four types • Developed mitigation strategies for each region. • Radiation test on the overall system is in progress. D/MAPLD 2004
Acronyms • SEU : Single Event Upset • FPGA: Field Programmable Gate Array • LUT: Look Up Table • PLB: Processor Local Bus • OPB: On-Chip Peripheral Bus • OCM: On-Chip Memory • EDAC: Error Detect-And-Correct • ICAP: Internal Configuration Access Point D/MAPLD 2004