160 likes | 338 Views
2. Introduction. Many designers assume that re-programmable FPGAs for space applications require:TMRSEU correction (configuration scrubbing)
E N D
1. A cost/benefit framework for evaluating re-configurable FPGA SEU mitigation techniques Carl Carmichael
Brendan Bridgford
Xilinx, Inc.
2. 2 Introduction Many designers assume that re-programmable FPGAs for space applications require:
TMR
SEU correction (configuration scrubbing)
… in all cases.
Mitigation can be costly; you should ask:
What is the reliability requirement?
What is the expected MTBF with no mitigation?
What is the expected MTBF with only scrubbing?
What is the expected MTBF with only XTMR?
What is the expected MTBF with the combination of both techniques?
Answers will vary with different mission characteristics.
3. 3 Typical SEE Mitigation “Typical” SEE Mitigation: XTMR + Scrubbing
XTMR and scrubbing are often used together
XTMR (Xilinx TMR) protects the design from any one upset
Configuration scrubbing prevents SEUs accumulating
When XTMR and scrubbing are used together, the overall functional failure rate is dominated by the SEFI rate:
V2 SEFI GEO ~60 years MTBF [1], continuous operation
SEFI rate is independent of device size
4. 4 The Unstated Assumptions… Five basic assumptions lie behind the typical prescription of XTMR + scrubbing:
(1) No functional errors can be tolerated
(2) All functional errors are persistent
(3)The FPGA must operate continuously for an extended period of time
(4) A high upset rate can be expected
(5) Design goal is to be “as reliable as possible”
5. 5 Assumption #1: No Functional Errors can be Tolerated Not always true: Many systems can tolerate some errors.
Error correction may be built into the data.
The consequence of an error may be minor (a pixel is inverted, for example).
6. 6 Assumption #2: All Functional Errors are Persistent Not always true: Some structures do not experience persistent errors after a single-event upset [2].
Persistent errors cannot be cleared by scrubbing alone.
Example: LFSRs, counters, other state logic
A SEU can cause these structures to go “out to lunch”, they must be scrubbed then reset. XTMR prevents these errors.
Non-persistent errors can be cleared by scrubbing alone.
Example: multipliers, any “feed-through” logic
SEU can cause a few incorrect calculations, although scrubbing will restore operation of the circuit.
7. 7 Assumption #3: Continuous, extended operation required Not always true: Many systems only operate for minutes or hours at a time.
Polar, other orbits may only require brief periods of operation.
An unmitigated design that operates for only a few minutes at a time can have a very high MTBF.
A design with XTMR that operates for a few hours can have a very high MTBF, even without scrubbing.
8. 8 Assumption #4: A High Upset Rate can be Expected Not always true: upset rates vary widely by orbit
2V6000 SEU rate, GEO: 6 SEUs/hour [1]
36km GEO, worst-case solar flare: ~300 SEUs/hr [3]
Of these, fewer than 1 in 10 will affect the design [4]
Note: configuration scrubbing can keep up with even worst-case SEU rates [3].
At lower upset rates, high MTBF can be achieved with less mitigation.
9. 9 Assumption #5: Design goal is “as reliable as possible” “As reliable as possible” is not a reliability target!
This implies that no cost is too great for a marginal improvement in reliability.
A quantifiable reliability target is needed
A reliability target must be set for SEU-induced functional failures, else there is no way to evaluate different technologies and mitigation techniques.
10. 10 XTMR + Scrubbing: costs and benefits XTMR
Benefit: Prevents persistent and non-persistent functional errors due to any SEU/SET.
Costs: ~3.5x increase in logic, pin utilization. Reduced timing performance.
Scrubbing
Benefit: Prevents SEUs from accumulating. Clears non-persistent errors.
Costs: Increased system complexity. Cannot use SRL16s or DistRAM (increases logic utilization).
11. 11 Mitigation Alternatives: costs and benefits Alternatives to XTMR: EDAC, periodic reset
Benefit: prevents persistent functional errors from any SEU/SET.
Cost: Must be able to tolerate non-persistent errors.
Alternative to Scrubbing: Periodic full reconfiguration
Benefits: Prevents SEUs from accumulating. Simpler than scrubbing. Does not preclude the use of SRL16s or DistRAM.
Costs: Design is interrupted during reconfiguration.
12. 12 Mission Characteristics Can your design tolerate some functional errors?
If yes, how much time is available to recover operation? You may not need XTMR.
Does your design contain feedback structures?
If no, SEUs/SETs will not cause persistent errors. XTMR may not be required.
Does your design need to operate continuously?
If no, you may not need scrubbing or XTMR.
What is the expected SEU rate?
On the order of seconds? Minutes? Hours?
Lower upset rates mean less need for scrubbing and XTMR to achieve same reliability.
What is the MTBF requirement for functional errors?
Factors: operating duration, SEU rate, error persistence, EDAC.
13. 13