1 / 26

Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs Hardened By Design vs. Design-Level Hardening

Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs Hardened By Design vs. Design-Level Hardening. Gary M. Swift and Ramin Roosta Jet Propulsion Laboratory / California Institute of Technology.

clove
Download Presentation

Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs Hardened By Design vs. Design-Level Hardening

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAsHardened By Designvs.Design-Level Hardening Gary M. Swift and Ramin Roosta Jet Propulsion Laboratory / California Institute of Technology The research done in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration (NASA) and was partially sponsored by the NASA Electronic Parts and Packaging Program. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology. 1

  2. In the beginning was Actel … • Leveraging from a commercial product line • ONO anti-fuse based  one-time programmable (OTP) • “beginning” = 1993 • Reference: Katz, R.; Barto, R.; McKerracher, P.; Carkhuff, B.; Koga, R.; “SEU hardening of field programmable gate arrays (FPGAs) for space applications and device characterization,” IEEE Transactions on Nuclear Science, Dec. 1994 2

  3. Later, Xilinx Leveraging from a commercial product line • SRAM based  reconfigurable “later” = 1998 • Reference: Guertin, S.M.; Swift, G.M.; Nguyen, D.; “Single-event upset test results for the Xilinx XQ1701L PROM”, Radiation Effects Data Workshop Record, 1999 • Quote: (Xilinx SRAM-based FPGAs)… “do appear suited to a broad range of other (non-critical) applications, such as sensor and camera controllers.” 3

  4. OUTLINE • FPGAs: A key enabling technology for modern spacecraft • Background in radiation testing of FPGAs • Earlier, Katz/Swift collaboration • Recently, Xilinx Consortium • Feature Comparison • Triple Modular Redundancy (TMR) - hardware approach vs. software approach • Concluding Remarks 4

  5. FPGAs: A key “enabling technology” Like custom ASICs, FPGAs can replace whole boards • Saving mass, volume, power • Achieving extra functionality FPGAs are much cheaper than ASICs • Design efforts can be later in the schedule • Design mistakes don’t require a re-spin through the foundry 5

  6. MER Pyro-Controller Used self-checking of configuration to initiate a re-configuration after spotting an upset 6

  7. MER Pyro-Controller Nearing Mars Xilinx XQR4062XL 7

  8. My Background • Actel experience is older • No direct involvement in radiation tests since the ONO anti-fuse was replaced • Results here are from others’ work • Xilinx experience is recent • Active participant in Xilinx Rad Test Consortium • Currently, finishing two+ year test campaign targeting the Virtex II family 8

  9. Currently Available Devices Actel RT54SX-S family vs. Xilinx Virtex II family (-SU) Note: both are essentially immune to single-event latchup and have good total ionizing dose tolerance, [ Actel > 135 krad(Si); Xilinx > 200 krad(Si) ] 9

  10. Main Feature Comparison Actel Xilinx RT54SX72S XQR2V6000 Gates: 72,000 ~6M ( /~3.2 ) flip/flops: 2012 67,584 / 3.2 = ~20k I/O Pins: 360 824 / 3 = 274 Speed external : 230 MHz 622 Mb/s (I-mode LVDS) Speed internal : 310 MHz 360 MHz 10

  11. Extra Features Comparison Actel Xilinx RT54SX72S XQR2V6000 Block RAM: no 2.5 Mb I/O Standards: many many Others: hardwired TMR Clock Manager Multipliers 11

  12. Actel: What bits can upset? User flip-flops only • Direct hits of same flip/flop in multiple domains • Very unlikely due to layout • Clock domain hits SEFI modes essentially eliminated 12

  13. Xilinx: What bits can upset? • NAND • Ex-OR • Flip-Flop type • etc… • Type of I/O • Mode of Block RAM Access • Clock Manager • etc… • Configuration Bits • Logical Function • Routing • User Options • Block RAM • User Flip-flops • Control Registers 13

  14. Xilinx: Heavy Ion Test Results Resulting in fairly low in-space rates: ~6 per day for 2V6000 in GCRmin. Low Threshold (soft) Low Susceptibility (hard) 14

  15. Actel: Heavy Ion Test Results Where’s Threshold ??? Low Susceptibility (~100x harder) Very low in-space rates (assume LETth > 40 achieved): ~1 per 6800 years for SX72-S in GCRmin. Data for two RTAX2000S prototypes at 1 MHz using checkerboard pattern from Fig. 12, JJ Wang et al., NSREC 2003 [Ref. 1] 15

  16. Actel-style TMR SX-A “R” cell triplicates to: RTSX-S “R” cell 16

  17. Actel-style TMR Actel-style TMR is fairly straightforward: • Each flip-flop is replaced by three plus feedback voter • Triplicated elements spread out physically • Uses one clock/inverse-clock domain • No external parts needed 17

  18. Xilinx-style TMR Xilinx-style TMR is more complicated: • First, it’s not too useful without configuration scrubbing • Whole functional blocks are triplicated, notindividual flip-flops • Three voters are used • Three clock domains • Elimination of: • Weak keepers (aka half latches) • Use of configuration cells as part of the design • For example, SRL16 • Needs some external circuitry (at least, a watchdog timer + PROMs) 18

  19. Xilinx-style TMR 19

  20. Xilinx-style TMR In Xilinx-style TMR, I/O’s use three pins tied externally : P Minority Voter D0 P Minority Voter D D1 P Minority Voter D2 Board Traces Pins 20

  21. Xilinx TMRtool • Xilinx-style TMR done by hand is difficult and tedious • An automated tool which integrates into the design flow has been developed (“now” available) • In-beam testing shows tool is very effective 21

  22. Upset Comparison • ATMR now has eliminated: • Upsets of static storage elements, and • SEFIs • ATMR upsets from: • Transients that are clocked into storage • Clock tree hits • Xilinx FPGAs have a small susceptibility to two types of SEFIs • Reset (sometimes only partial) • Disable scrub port • XTMR in combination with scrubbing can lower system upset rates below the SEFI rate 22

  23. Rate Comparison • Actel • Dominated by transients • Roughly one system error per thousand years (GCRmin) • Xilinx • Dominated by SEFI rate • Expect one SEFI per ~65 years in GCRmin • Expect one system error ~5-20x less often GCR = Galactic Cosmic Ray background (interplanetary space) almost identical to geosynchronous orbit 23

  24. CONCLUSIONS For the present – Both can achieve very acceptable radiation tolerance Actel wins on: • Less burden on the designer • No auxiliary components • Lower SEFI susceptibility Xilinx wins on: • Designer control of the resources vs. hardness tradeoff • On-chip feature set • Re-configurability Competition is good. 24

  25. Acronyms FPGA - Field Programmable Gate Array ASIC - Application Specific Integrated Circuit SEU - Single Event Upset SEFI - Single Event Functionality Interrupt TMR - Triple Modular Redundancy ATMR - Actel-style TMR XTMR - Xilinx-style TMR LET - Linear Energy Transfer (proportional to deposited charge per micron for a heavy ion strike on an active node) GCRmin - Galactic Cosmic Ray background (highest during “solar minimum” period of ~11-yr cycle of sunspots) MER - Mars Exploration Rovers (i.e., Spirit and Opportunity) 25

  26. Additional References [1] J.J. Wang, W. Wong, S. Wolday, B. Cronquist, J. McCollum, R. Katz, I. Kleyner, “Single event upset and hardening in 0.15  antifuse-based field programmable gate array,” IEEE Transactions on Nuclear Science, Dec. 2003 [2] Jih-Jong Wang, R.B. Katz, F. Dhaoui, J.L. McCollum, W. Wong, B.E. Cronquist, R.T. Lambertson, E. Hamdy, I. Kleyner, W. Parker, “Clock buffer circuit soft errors in antifuse-based field programmable gate arrays,” IEEE Transactions on Nuclear Science, Dec. 2000 [3] R. Katz, J.J. Wang, R. Koga, K.A. LaBel, J. McCollum, R. Brown, R.A. Reed, B. Cronquist, S. Crain, T. Scott, W. Paolini, B. Sin, “Current radiation issues for programmable elements and devices,” IEEE Transactions on Nuclear Science, Dec. 1998 26

More Related