1 / 16

A cost

2. Introduction. Many designers assume that re-programmable FPGAs for space applications require:TMRSEU correction (configuration scrubbing)

ivria
Download Presentation

A cost

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. A cost/benefit framework for evaluating re-configurable FPGA SEU mitigation techniques Carl Carmichael Brendan Bridgford Xilinx, Inc.

    2. 2 Introduction Many designers assume that re-programmable FPGAs for space applications require: TMR SEU correction (configuration scrubbing) … in all cases. Mitigation can be costly; you should ask: What is the reliability requirement? What is the expected MTBF with no mitigation? What is the expected MTBF with only scrubbing? What is the expected MTBF with only XTMR? What is the expected MTBF with the combination of both techniques? Answers will vary with different mission characteristics.

    3. 3 Typical SEE Mitigation “Typical” SEE Mitigation: XTMR + Scrubbing XTMR and scrubbing are often used together XTMR (Xilinx TMR) protects the design from any one upset Configuration scrubbing prevents SEUs accumulating When XTMR and scrubbing are used together, the overall functional failure rate is dominated by the SEFI rate: V2 SEFI GEO ~60 years MTBF [1], continuous operation SEFI rate is independent of device size

    4. 4 The Unstated Assumptions… Five basic assumptions lie behind the typical prescription of XTMR + scrubbing: (1) No functional errors can be tolerated (2) All functional errors are persistent (3)The FPGA must operate continuously for an extended period of time (4) A high upset rate can be expected (5) Design goal is to be “as reliable as possible”

    5. 5 Assumption #1: No Functional Errors can be Tolerated Not always true: Many systems can tolerate some errors. Error correction may be built into the data. The consequence of an error may be minor (a pixel is inverted, for example).

    6. 6 Assumption #2: All Functional Errors are Persistent Not always true: Some structures do not experience persistent errors after a single-event upset [2]. Persistent errors cannot be cleared by scrubbing alone. Example: LFSRs, counters, other state logic A SEU can cause these structures to go “out to lunch”, they must be scrubbed then reset. XTMR prevents these errors. Non-persistent errors can be cleared by scrubbing alone. Example: multipliers, any “feed-through” logic SEU can cause a few incorrect calculations, although scrubbing will restore operation of the circuit.

    7. 7 Assumption #3: Continuous, extended operation required Not always true: Many systems only operate for minutes or hours at a time. Polar, other orbits may only require brief periods of operation. An unmitigated design that operates for only a few minutes at a time can have a very high MTBF. A design with XTMR that operates for a few hours can have a very high MTBF, even without scrubbing.

    8. 8 Assumption #4: A High Upset Rate can be Expected Not always true: upset rates vary widely by orbit 2V6000 SEU rate, GEO: 6 SEUs/hour [1] 36km GEO, worst-case solar flare: ~300 SEUs/hr [3] Of these, fewer than 1 in 10 will affect the design [4] Note: configuration scrubbing can keep up with even worst-case SEU rates [3]. At lower upset rates, high MTBF can be achieved with less mitigation.

    9. 9 Assumption #5: Design goal is “as reliable as possible” “As reliable as possible” is not a reliability target! This implies that no cost is too great for a marginal improvement in reliability. A quantifiable reliability target is needed A reliability target must be set for SEU-induced functional failures, else there is no way to evaluate different technologies and mitigation techniques.

    10. 10 XTMR + Scrubbing: costs and benefits XTMR Benefit: Prevents persistent and non-persistent functional errors due to any SEU/SET. Costs: ~3.5x increase in logic, pin utilization. Reduced timing performance. Scrubbing Benefit: Prevents SEUs from accumulating. Clears non-persistent errors. Costs: Increased system complexity. Cannot use SRL16s or DistRAM (increases logic utilization).

    11. 11 Mitigation Alternatives: costs and benefits Alternatives to XTMR: EDAC, periodic reset Benefit: prevents persistent functional errors from any SEU/SET. Cost: Must be able to tolerate non-persistent errors. Alternative to Scrubbing: Periodic full reconfiguration Benefits: Prevents SEUs from accumulating. Simpler than scrubbing. Does not preclude the use of SRL16s or DistRAM. Costs: Design is interrupted during reconfiguration.

    12. 12 Mission Characteristics Can your design tolerate some functional errors? If yes, how much time is available to recover operation? You may not need XTMR. Does your design contain feedback structures? If no, SEUs/SETs will not cause persistent errors. XTMR may not be required. Does your design need to operate continuously? If no, you may not need scrubbing or XTMR. What is the expected SEU rate? On the order of seconds? Minutes? Hours? Lower upset rates mean less need for scrubbing and XTMR to achieve same reliability. What is the MTBF requirement for functional errors? Factors: operating duration, SEU rate, error persistence, EDAC.

    13. 13

More Related