1 / 31

Improving FPGA Design Robustness with Partial TMR

Improving FPGA Design Robustness with Partial TMR. Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael Wirthlin 1 1 Brigham Young University Department of Electrical Engineering 2 Los Alamos National Laboratory. x. MTBF. Reliability constraint.

kerem
Download Presentation

Improving FPGA Design Robustness with Partial TMR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael Wirthlin 1 1 Brigham Young University Department of Electrical Engineering 2 Los Alamos National Laboratory 1

  2. x MTBF Reliability constraint Area constraint Area Cost Motivation for Partial TMR • Factors of fault-tolerant computing: • Availability • Reliability • Mitigation Cost • Full TMR • Expensive in terms of power, speed, area, etc. • Worthwhile if affordable! 2

  3. Motivation for Partial TMR • Partial TMR offers: • Mitigation of most sensitive design structures • Increased availability of a system by decreasing number of system resets • Decreased mitigation cost over full TMR • Suitability of Partial TMR is application dependent • Reduced reliability compared to full TMR 3

  4. 1 0 0 1 0 1 1 0 1 0 1 0 1 0 0 0 1 1 0 0 0 1 Scrubbing • Must be included with Partial Mitigation • Continuously ‘read’ and ‘clean’ configuration memory • Single bit will be upset no longer than ts ts = time for one scrub 4

  5. Non-Persistent Errors • An SEU in the non-persistent cross-section will cause a temporary interruption of service • Requires partial reconfiguration to correct Scrubbing Repairs Configuration error magnitude Correct Output time cycle error = delta between outputs of a golden and DUT circuit 5

  6. Persistent Errors • An SEU in the persistent cross-section will cause a permanent interruption of service • Requires full system reset to correct Scrubbing Repairs Configuration error magnitude Incorrect Output time cycle error = delta between outputs of a golden and DUT circuit 6

  7. Non-Persistent Circuit Structures Logic FF Logic FF • Generally consists of circuit components and routing in a feed-forward path Logic FF Logic FF Logic FF 7

  8. Persistent Circuit Structures Logic FF Logic FF • Generally consists of circuit components and routing in, or contributing to, a feed-back path Logic FF Logic FF Logic FF 8

  9. Partial Mitigation Logic FF Logic FF TMR Logic FF Logic FF Logic FF • Apply a mitigation technique to just the persistent cross section 9

  10. Limitations of Partial Mitigation • Does not prevent all errors • System must be corrected with configuration bitstream scrubbing • Circuit configuration can be incorrect between scrubbing • Non-persistent errors remain 10

  11. Automated Partial TMR • Analyze an EDIF source file for feedback structures • Protect these sections with TMR to reduce persistent cross section 11

  12. BLTmr Partial TMR Tool • BYU-LANL Triple Modular Redundancy: Configurable Reliability • Limit mitigation to minimize: • design resource requirements • power consumption • Mitigation focused on persistent circuit structures 12

  13. BLTmr Partial TMR Tool • Design Divided into three sections: • Feedback, Input to FB, Output Logic FF Logic FF Logic FF Logic FF Logic FF 13

  14. Logic FF Logic FF Logic FF Logic FF Logic FF BLTmr Partial TMR Tool • Design Divided into three sections: • Feedback, Input to FB, Output 14

  15. Logic FF Logic FF Logic FF Logic FF Logic FF BLTmr Partial TMR Tool • Design Divided into three sections: • Feedback, Input to FB, Output 15

  16. Logic FF Logic FF Logic FF Logic FF Logic FF BLTmr Partial TMR Tool • Design Divided into three sections: • Feedback, Input to FB, Output 16

  17. BLTmr Tool Options • BLTmr Tool applies TMR mitigation to subsections of the design: • Feedback Only • Feedback + Input to Feedback • FB + Input to FB + Output (Full TMR) 17

  18. BLTmr Tool Options • BLTmr Tool applies TMR mitigation to subsections of the design: • Feedback Only • Feedback + Input to Feedback • FB + Input to FB + Output (Full TMR) 18

  19. BLTmr Tool Options • BLTmr Tool applies TMR mitigation to subsections of the design: • Feedback Only • Feedback + Input to Feedback • FB + Input to FB + Output (Full TMR) 19

  20. BLTmr Tool Options • BLTmr Tool applies TMR mitigation to subsections of the design: • Feedback Only • Feedback + Input to Feedback • FB + Input to FB + Output (Full TMR) 20

  21. BLTmr Tool Flow Partially Mitigated Design Original Design User Constraints • BYU EDIF development environment reads in user design • Design organized into graph structure for analysis Analysis (Feedback, Input to FB, etc.) Create Design Database Parse EDIF Cell Triplication Voter Insertion 21

  22. BLTmr Tool Flow Partially Mitigated Design Original Design User Constraints • User may direct mitigation • Design analyzed to classify components as described Analysis (Feedback, Input to FB, etc.) Create Design Database Parse EDIF Cell Triplication Voter Insertion 22

  23. BLTmr Tool Flow Partially Mitigated Design Original Design User Constraints • Circuit elements triplicated • Voters inserted • Mitigated design written in EDIF format Analysis (Feedback, Input to FB, etc.) Create Design Database Parse EDIF Cell Triplication Voter Insertion 23

  24. Example Circuits • Tests on two designs • DSP Kernel • Synthetic Design • LFSR modules feeding into an add-multiply tree 24

  25. Unmitigated Fault Analysis FPGA Editor Layout Sensitivity Map Persistence Map DSP Kernel 575,448 bits (9.9%) 13,841 bits (0.23%) 5,746 slices (46%) Synthetic Design 2,538 slices (20%) 189,835 bits (3.3%) 77,159 bits (1.3%) 25

  26. Experimental Results – Design #1DSP Kernel FPGA Editor Layout Sensitivity Map Persistence Map Unmitigated 575,448 (9.90%) 13,841 (0.24%) 5,746 slices (46%) Partial TMR applied to Feedback & Input to FB 569,700 (9.81%) 152 (0.0026%) 8,036 slices (65%) 26

  27. Experimental Results – Design #2Synthetic (LFSR/Mult) FPGA Editor Layout Sensitivity Map Persistence Map Unmitigated 189,835 (3.27%) 77,159 (1.33%) 2,538 slices (20%) Full TMR Applied 671 (0.012%) 20,256 (0.35%) 11,961 slices (97%) 27

  28. Experimental Results * Full TMR could not be applied to DSP Kernel due to FPGA resource constraints “Qpro Virtex 2.5V radiation hardened FPGAs”, Xilinx Inc., DS028 (v1.2), Nov. 5, 2001. 28

  29. Experimental Results • GPS orbit (22,200 km altitude, 55° inclination) • AP-8 Solar Minimum, JPL Solar Proton Quiet, CRÈME 96 Solar Minimum 29

  30. Summary of Results * Unmitigated to Partial TMR of Feedback + Input to FB ‡ Unmitigated to Full TMR ‡‡GPS orbit; AP-8 Solar Minimum, JPL Solar Proton Quiet, CRÈME 96 Solar Minimum 30

  31. Conclusions • Pros: Partial TMR (BLTmr) as fault mitigation offers: • Increased system availability due to fewer system resets • More “affordable” fault mitigation than full TMR • Critical design areas are mitigated with an automated tool • Cons: • Much of the design may be unmitigated, leaving sensitive sections • May result in temporary errors 31

More Related