130 likes | 261 Views
Evaluate the Impact of Soft Errors on Processor Datapath ---- A Focus of Int. FUs. Jie Hu, Rajar Ramanarayanan Dept. of CSE. Motivation. Soft error – a big reliability problem in processor design Processor components are more susceptible to soft errors in new technology
E N D
Evaluate the Impact of Soft Errors on Processor Datapath---- A Focus of Int. FUs Jie Hu, Rajar Ramanarayanan Dept. of CSE
Motivation • Soft error – a big reliability problem in processor design • Processor components are more susceptible to soft errors in new technology • Cache structures can be well protected by parity, ECC, etc. • Combinational logic: time/space redundancy • Plenty of work on error detection/recovery [6][5][4] • How soft errors in combinational logic affect the system? • Any better cost-effective reliable designs?
A Focus on Functional Units • Integer Functional Units: have a wide range of impact on program execution • Conditions of branches • Addresses of data references • Addresses of function references • Floating-point Functional Units • Mostly for numerical operations • Less impact on the execution of program • Today’s focus: Int. ALU (Adder, Logic), Int. MULT/DIV
Error Injection Scheme • Error Injection based on hardware: • Need circuit details of functional units • Diff. processors may use diff. design styles • Difficult to get the error-infected results at architectural level • A more effective way • Introduce soft errors at one of its source operands • Restore the original source operand value if the result reg. no is diff. from that source reg. • Experimental scheme • Only consider SEU (single event upset) • Simulating a maximum 0.5 Billion committed inst.
Addition Operations • Insert soft errors to addition operations at a fixed interval (10,000 cycles) till program execution crashes • Error cumulation results in program crashes • Different applications have different resistance • Additional exp.: Single error at diff. cycle time didn’t crash
Addition: Uniform Error Rate • Introduce soft errors at different uniformly distributed probabilities • For all benchmarks, error rate of 0.0001 is the most sensitive point
Logic: Uniform Error Rate • 175.vpr (FPGA Placement and Routing) is more sensitive to errors happened during logic operation • 256.bzip2 (Compression) can survive from large number of errors
ALU: Uniform Error Rate • A combinational effect of errors in both addition and logic operation • All benchmarks show a exacerbated behavior except 175.vpr,
MULT/DIV: Uniform Error Rate • In general, programs can still survive from errors happened in MULT/DIV operations due to their less number and less relationship to the program execution control.
References • [1] Ghani A. Kanawati, Nasser A. Kanawati, and Jacob A. Abraham. FERRARI: A Flexible Software-Based Fault and Error Injection System. IEEE Transactions on Computers, 44(2):248-260, February 1995. • [2] S Mitra and E. J. McCluskey. Which concurrent error detection scheme to choose ? In Proceedings of International Test Conference, pages 985 - 994, October 2000. • [3] S Mitra, N. R. Saxena, and E. J. McCluskey. A design diversity metric and reliability analysis for redundant systems. In Proceedings of International Test Conference, pages 662-671, September 1999. • [4] Nahmsuk Oh, Subhasish Mitra, and Edward J. McCluskey. ED4I: Error Detection by Diverse Data and Duplicated Instructions. IEEE Transactions on Computers, 51(2):180-199, February 2002. • [5] Joydeep Ray, James C. Hoe, and Babak Falsa. Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. In Proc. the 34th Annual International Symposium on Microarchitecture, 2001. • [6] E. Rotenberg. AR-SMT: A microarchitectural approach to fault tolerance in micro- processors. In Proceedings of the 29th Fault-Tolerant Computing Symposium, June 1999. • [7] P Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the eect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 389-398, june 2002. • [8] J. F. Ziegler et al. IBM experiments in soft fails in computer electronics (1978 - 1994). IBM Journal of Research and Development,, 40(1):3-18, 1996.