1 / 22

Cost-Efficient Soft Error Protection for Embedded Microprocessors

Cost-Efficient Soft Error Protection for Embedded Microprocessors. Jason Blome 1 , Shuguang Feng 1 , Shantanu Gupta 1 , Scott Mahlke 1 , Daryl Bradley 2 University of Michigan 1 ARM, Ltd. 2. CLK. 0. Q. D. transient fault. soft error. The Soft Error Problem. 1. Register File.

dewey
Download Presentation

Cost-Efficient Soft Error Protection for Embedded Microprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2 University of Michigan1 ARM, Ltd. 2 1

  2. CLK 0 Q D transient fault soft error The Soft Error Problem 1 2

  3. Register File mov r2, 4 0 - mov r5, 8 1 - 0 2 - add r6, r2, r5 decoder 3 - 0 4 - CLK 5 - … tsetup thold Fault Masking • Logical: faulted value does not affect logical operation of the circuit • Architectural/Software: incorrect state is written before it is read • Latching-Window: the fault pulse does not reach a state element within the latching window • Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit mov r2, 4 mov r5, 8 4 add r6, r2, r5 8 9 3

  4. Soft Error Rate Trends Soft Error Rate Contributions Mitra 2005 Shivakumar 2002 Increasing contribution of faults in combinational logic to the overall soft error rate 4

  5. Outline • Soft error analysis setup • Summary of fault analysis results • Fault tolerance techniques • Register value cache • Strategic deployment of fault detectors • Conclusion 5

  6. testbench reference design test design benchmark ARM926EJ-S Instruction Fetch Instruction Decode Data cache Data Interface error checking and logging fault injection scheduler MMU Instruction Address Logic Register Bank Mux Array Instruction cache ALU Shift fault injection/error analysis framework MMU Write Buffer/ Bus Interface Multiply Bus Interface Data Address Logic report generation Fault Analysis Framework 6

  7. 94% 7% 16% 4% Observed Error Rates Faults Occurring in Registers Faults Occurring in Combinational Logic At the software interface, error rates within 3% 7

  8. Impact of Fault Injection 8

  9. Targeting the Faults that Count • ARM926EJ-S register file consumes 8.7% of total core area • Responsible for 57.4% of architectural errors • Register file area dominated by combinational logic • ECC cost, efficacy? 9

  10. The Register Value Cache Register File 0 1 Read/Write Addr/Data 2 decoder 3 Read Result 4 5 … Register Value Cache 0 CMP 1 x 2 Stall/ Check CRC 3 CMP 4 5 x CMP … 10

  11. The Register Value Cache Index Array Valid Value Array Read Data Read/Write Addr Previous Read Values Write Data CRC CMP Error Write Data CRC Read Operation Check Operation Write Operation Error 11

  12. 4 crc 8 crc Check CRC Example Register File 0 - mov r2, 4 mov r2, 4 1 - 4 2 - 4 decoder 3 - mov r5, 8 mov r5, 8 0 4 - 8 5 - add r3, r1, r4 add r3, r2, r5 … Register Cache 0 - - 1 - - 4 x 2 - - 3 - - 8 4 5 x … 12

  13. RVC Fault Coverage 57.4% 13

  14. RVC Overhead 14

  15. What About the Rest? • Leverage fault fanout to place detectors at likely targets 15

  16. Fault Fanout 16

  17. Transient Fault Detector D Main Flip-Flop Main Flip-Flop Q CLK Shadow Latch Shadow Latch Error Delay A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006 17

  18. Glitch Detector Coverage Power Area Coverage Coverage Percent Overhead Percent Overhead 18

  19. Combined Technique Coverage Power Area Coverage Coverage Percent Overhead Percent Overhead 19

  20. Conclusion • Circuit level soft error analysis offers significant insight • Faults in combinational logic do not require structural duplication • Coverage versus cost tradeoffs available • Significant benefits in compromise • 85% fault coverage for only 5.5% area • 2-3x increase in MTTF 20

  21. Questions? 21

  22. RVC Hit Rates 22

More Related