1 / 13

Evaluate the Impact of Soft Errors on Processor Datapath ---- A Focus of Int. FUs

Evaluate the Impact of Soft Errors on Processor Datapath ---- A Focus of Int. FUs. Jie Hu, Rajar Ramanarayanan Dept. of CSE. Motivation. Soft error – a big reliability problem in processor design Processor components are more susceptible to soft errors in new technology

yukio
Download Presentation

Evaluate the Impact of Soft Errors on Processor Datapath ---- A Focus of Int. FUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluate the Impact of Soft Errors on Processor Datapath---- A Focus of Int. FUs Jie Hu, Rajar Ramanarayanan Dept. of CSE

  2. Motivation • Soft error – a big reliability problem in processor design • Processor components are more susceptible to soft errors in new technology • Cache structures can be well protected by parity, ECC, etc. • Combinational logic: time/space redundancy • Plenty of work on error detection/recovery [6][5][4] • How soft errors in combinational logic affect the system? • Any better cost-effective reliable designs?

  3. Superscalar Processor Core

  4. A Focus on Functional Units • Integer Functional Units: have a wide range of impact on program execution • Conditions of branches • Addresses of data references • Addresses of function references • Floating-point Functional Units • Mostly for numerical operations • Less impact on the execution of program • Today’s focus: Int. ALU (Adder, Logic), Int. MULT/DIV

  5. Experimental Setup

  6. Error Injection Scheme • Error Injection based on hardware: • Need circuit details of functional units • Diff. processors may use diff. design styles • Difficult to get the error-infected results at architectural level • A more effective way • Introduce soft errors at one of its source operands • Restore the original source operand value if the result reg. no is diff. from that source reg. • Experimental scheme • Only consider SEU (single event upset) • Simulating a maximum 0.5 Billion committed inst.

  7. Addition Operations • Insert soft errors to addition operations at a fixed interval (10,000 cycles) till program execution crashes • Error cumulation results in program crashes • Different applications have different resistance • Additional exp.: Single error at diff. cycle time didn’t crash

  8. Addition: Uniform Error Rate • Introduce soft errors at different uniformly distributed probabilities • For all benchmarks, error rate of 0.0001 is the most sensitive point

  9. Logic: Uniform Error Rate • 175.vpr (FPGA Placement and Routing) is more sensitive to errors happened during logic operation • 256.bzip2 (Compression) can survive from large number of errors

  10. ALU: Uniform Error Rate • A combinational effect of errors in both addition and logic operation • All benchmarks show a exacerbated behavior except 175.vpr,

  11. MULT/DIV: Uniform Error Rate • In general, programs can still survive from errors happened in MULT/DIV operations due to their less number and less relationship to the program execution control.

  12. Ongoing Work

  13. References • [1] Ghani A. Kanawati, Nasser A. Kanawati, and Jacob A. Abraham. FERRARI: A Flexible Software-Based Fault and Error Injection System. IEEE Transactions on Computers, 44(2):248-260, February 1995. • [2] S Mitra and E. J. McCluskey. Which concurrent error detection scheme to choose ? In Proceedings of International Test Conference, pages 985 - 994, October 2000. • [3] S Mitra, N. R. Saxena, and E. J. McCluskey. A design diversity metric and reliability analysis for redundant systems. In Proceedings of International Test Conference, pages 662-671, September 1999. • [4] Nahmsuk Oh, Subhasish Mitra, and Edward J. McCluskey. ED4I: Error Detection by Diverse Data and Duplicated Instructions. IEEE Transactions on Computers, 51(2):180-199, February 2002. • [5] Joydeep Ray, James C. Hoe, and Babak Falsa. Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. In Proc. the 34th Annual International Symposium on Microarchitecture, 2001. • [6] E. Rotenberg. AR-SMT: A microarchitectural approach to fault tolerance in micro- processors. In Proceedings of the 29th Fault-Tolerant Computing Symposium, June 1999. • [7] P Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the eect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 389-398, june 2002. • [8] J. F. Ziegler et al. IBM experiments in soft fails in computer electronics (1978 - 1994). IBM Journal of Research and Development,, 40(1):3-18, 1996.

More Related