1 / 28

An Architectural framework for evaluating impact of soft errors in arithmetic units

Rajaraman R Jie Hu. An Architectural framework for evaluating impact of soft errors in arithmetic units. Overview. Introduction Circuit level estimation (Q critical ) Single bit adders Four bit adders Results and optimizations Converting Q critical to SER Architectural simulations

filia
Download Presentation

An Architectural framework for evaluating impact of soft errors in arithmetic units

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rajaraman R Jie Hu An Architectural framework for evaluating impact of soft errors in arithmetic units

  2. Overview • Introduction • Circuit level estimation (Qcritical) • Single bit adders • Four bit adders • Results and optimizations • Converting Qcritical to SER • Architectural simulations • Results and solutions • Conclusion and future work

  3. Introduction • Data paths and combinational logic are inherently resistant to soft errors due to [shivakumar’02]: • Logical masking ( effect remains same as technology scales) • Electrical masking (effect reduces) • Latching window masking (effect reduces) • Their susceptibility is increasing as: • Pipeline depth increases & • Devices scale

  4. Introduction • In this work we present: • Circuit level estimation of soft errors for: • Single bit adders • Four bit adders • Discuss solutions based on concurrent error detection and other solutions. • Architectural simulations (Jie Hu) • Architectural solutions for data path error detection and correction

  5. Circuit level estimation • QCritical estimated by having a FF at output • Here we estimate the Qcritical for: • Single bit adders • Mirror adder • Transmission Gate based adder • Half adder based full adder • XOR based full adder • Four bit adders • Ripple carry adder • Carry skip adder • Prefix adder (Brent-kung)

  6. V DD V A V DD DD A B A C B B B i Kill "0"-Propagate A C i C o S C i C A i "1"-Propagate Generate A B A A B C B i B Single bit adders Mirror adder Transmission gate adder Nodes evaluated for Qcritical

  7. A B Cin SUM Cout Single bit adders Half adder based FA XOR based FA Cin A G G B P Nodes evaluated for Qcritical

  8. Four bit adders • Ripple carry adder • Flip at the lowest FA cell, will take very high Qcritical to affect the MSB • But it affects all sums in worst case scenario • May often result in multi bit errors

  9. Four bit adders • Carry skip (bypass) adder • Has a faster block propagate signal and logic which will have lower Qcritical • But lower multi-bit errors

  10. Four bit adders • Brent-kung adder • Qcritical for S3 might be lower than RCA but higher than CSA for worst case scenario • Trade-off between multi-bit errors and Qcritical value could be studied for different prefix adder designs 0 1 2 3 S S S S ) ) ) ) 1 2 0 3 B B B B , , , , 0 1 2 3 A A A A ( ( ( (

  11. Results HA based FA mirror TG based

  12. Results mirror HA based FA TG based

  13. Optimization techniques • Concurrent error detection techniques will work well for these adder designs • [Mitra’00] proposes that design diversity in designs results in more robust designs • With the existing trade-offs in the various adder designs, diversity could be used to build robust CED design. • Other techniques include : • Arithmetic coding techniques like carry checking/parity prediction adders [Nicolaidis’03] • Other redundancy techniques like time redundancy [Nicolaidis’99]

  14. Converting Qcritical to SER • We know: • SER α Nflux * CS*exp (Qcritical /Qs) [Hazucha, 2000] • Nflux- Neutron Flux (difficult to find) • CS- Cross Sectional area • Qcritical – Critical charge necessary for a Bit Flip • Qs – Charge Collection Efficiency (difficult to find) • Thus only Qcritical is easiest to determine!! • Working on finding other metrics to find SER …

  15. References (Circuits) • [Nicolaidis’99] Nicolaidis, M.; “Time redundancy based soft-error tolerance to rescue nanometer technologies”, Proceedings of 17th IEEE VLSI Test Symposium, 25-29 April 1999 Page(s): 86 -94 • [Nicolaidis’03] Nicolaidis, M.; “Carry checking/parity prediction adders and ALUs” IEEE Transactions on Very Large Scale Integration (VLSI) Systems,, Volume: 11 Issue: 1 , Feb. 2003 Page(s): 121 -128 • [Mitra’00] Mitra, S.; McCluskey, E.J.; “Which concurrent error detection scheme to choose ?” Proceedings of international Test Conference, 3-5 Oct. 2000 Page(s): 985 -994 • [Shivakumar ’02]   Shivakumar, P.; Kistler, M.; Keckler, S.W.; Burger, D.; Alvisi, L.; “Modeling the effect of technology trends on the soft error rate of combinational logic” Proceedings of International Conference on Dependable Systems and Networks, 23-26 June 2002 Page(s): 389 -398 • [Hazucha, 2000] Hazucha P.; and Svensson C.; “Impact of CMOS Technology Scaling on the Atmospheric Neutron Soft Error Rate” IEEE Transactions on Nuclear Science, Vol. 47, No. 6, Dec. 2000.

  16. Evaluate the Impact of Soft Errors on Processor Datapath---- A Focus of Int. FUs

  17. Motivation • Soft error – a big reliability problem in processor design • Processor components are more susceptible to soft errors in new technology • Cache structures can be well protected by parity, ECC, etc. • Combinational logic: time/space redundancy • Plenty of work on error detection/recovery [6][5][4] • How soft errors in combinational logic affect the system? • Any better cost-effective reliable designs?

  18. Superscalar Processor Core

  19. A Focus on Functional Units • Integer Functional Units: have a wide range of impact on program execution • Conditions of branches • Addresses of data references • Addresses of function references • Floating-point Functional Units • Mostly for numerical operations • Less impact on the execution of program • Today’s focus: Int. ALU (Adder, Logic), Int. MULT/DIV

  20. Experimental Setup

  21. Error Injection Scheme • Error Injection based on hardware: • Need circuit details of functional units • Diff. processors may use diff. design styles • Difficult to get the error-infected results at architectural level • A more effective way • Introduce soft errors at one of its source operands • Restore the original source operand value if the result reg. no is diff. from that source reg. • Experimental scheme • Only consider SEU (single event upset) • Simulating a maximum 0.5 Billion committed inst.

  22. Addition Operations • Inject soft errors to addition operations at a fixed interval (10,000 cycles) till program execution crashes • Error accumulation results in program crashes • Different applications have different resistance to errors • Additional exp.: Single error at diff. cycle time didn’t crash

  23. Addition: Uniform Error Rate • Introduce soft errors at different uniformly distributed probabilities • For all benchmarks, error rate of 0.0001 is the most sensitive point

  24. Logic: Uniform Error Rate • 175.vpr (FPGA Placement and Routing) is more sensitive to errors happened during logic operations • 256.bzip2 (Compression) can survive from large number of errors

  25. ALU: Uniform Error Rate • A combinational effect of errors in both addition and logic operations • All benchmarks show an exacerbated behavior except 175.vpr

  26. MULT/DIV: Uniform Error Rate • In general, programs can still survive from errors happened in MULT/DIV operations due to their less number and less relationship to the program execution control.

  27. Conclusions and Ongoing Work • Conclusions: • Errors in different Int. operations have different impact on the program execution • Different programs have different behavior under error injection • Control-intensive (lower IPB) applications are more sensitive to logic operation errors • Multiplication/Division operations have less impact on program execution • Future work • More detailed characterization of program behavior under error impact • Modeling the soft error rate from Qcritical for arithmetic units… • Use the above information to develop some selective error protection/detection/recovery schemes…

  28. References • [1] Ghani A. Kanawati, Nasser A. Kanawati, and Jacob A. Abraham. FERRARI: A Flexible Software-Based Fault and Error Injection System. IEEE Transactions on Computers, 44(2):248-260, February 1995. • [2] S Mitra and E. J. McCluskey. Which concurrent error detection scheme to choose ? In Proceedings of International Test Conference, pages 985 - 994, October 2000. • [4] Nahmsuk Oh, Subhasish Mitra, and Edward J. McCluskey. ED4I: Error Detection by Diverse Data and Duplicated Instructions. IEEE Transactions on Computers, 51(2):180-199, February 2002. • [5] Joydeep Ray, James C. Hoe, and Babak Falsa. Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. In Proc. the 34th Annual International Symposium on Microarchitecture, 2001. • [6] E. Rotenberg. AR-SMT: A microarchitectural approach to fault tolerance in micro- processors. In Proceedings of the 29th Fault-Tolerant Computing Symposium, June 1999. • [8] J. F. Ziegler et al. IBM experiments in soft fails in computer electronics (1978 - 1994). IBM Journal of Research and Development,, 40(1):3-18, 1996.

More Related