160 likes | 289 Views
Analyzing Circuit-aware Microarchitectural Reliability. Taniya Siddiqua , Paul Lee taniya@cs.virginia.edu, pl4u@cs.virginia.edu University of Virginia, Charlottesville. Motivation. Transistor Size. Time. Transient Faults. Hard Errors (EM, TC, SM, TDDB, NBTI). 5%. Problem Description.
E N D
Analyzing Circuit-aware Microarchitectural Reliability Taniya Siddiqua , Paul Lee taniya@cs.virginia.edu, pl4u@cs.virginia.edu University of Virginia, Charlottesville
Motivation Transistor Size Time Transient Faults Hard Errors (EM, TC, SM, TDDB, NBTI) 5%
Problem Description • Architects focus on this problem at architecture-level granularity • Point of focus are architectural structures for e.g. caches, ALU etc. • Reliability predictions are circuit-agnostic • There is a potential gap between architecture and circuit level reliability estimation 10%
Problem Description We : • Show that circuit-level granularity affects architecture-level granularity reliability simulations • Look into 2 hard-errors viz. NBTI (or Negative Bias Temperature Instability) and TDDB (or Time Dependent Dielectric Breakdown) at architecture and circuit level on ALU • Determine the effect of scaling of NBTI and TDDB on ALU up to 22nm technology • Propose a design of NBTI-aware ALU, which utilizes architecture as well as circuit-level optimizations 15%
NBTI – A quick guide • Key reliability issue related to P-Channel MOS • Concerned with MOS devices stressed with negative gate voltages • Manifests as the threshold voltage increase and drain current decrease • Consequently the circuit slows down – timing constraint • Good News! -- Recovery starts as soon as stress is removed 25%
Architecture-level Reliability Simulation We simulate: • 2-wide issue core having 2 INT ALUs • SimpleScalar 3.0 for modeling processor behavior • Wattch and HotSpot for simulating power and temperature behavior respectively • Estimate lifetime of 1st INT ALU • Lifetimes of ALUs are projected based on MTTF for NBTI 35%
Circuit-level Reliability Simulation We : • Use Kogge-Stone adder circuit for ALU • Use average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence framework • Calculate stress and recovery time based on utilization pattern obtained from architectural-level reliability simulation • Calculate lifetime based on circuit-delay to be 25 % of original delay 45%
Scaling Effect We : • Show scaling effect for 65nm, 45nm, 32nm, 22nm • Show output delay for NBTI for each technology scale after 7 yrs 65 nm (25%), 45 nm(27%), 32 nm (31%), 22 nm (46%) • Require design of NBTI-aware ALU 55%
NBTI-aware ALU Design We : • Determine that SPEC2000 INT benchmarks have 50 % operands of 16-bit size • Partition 64-bit ALU into four 8-bit and two 16-bit independent blocks to support 8,16,32 and 64bit operation • Aim is to use utilize idle time and narrow-width operands to increase recovery time of PMOS devices • Use Power gating technique • Use round-robin mechanism to let all the blocks of ALU experience equal recovery time • After 7 yrs the delay is only 10% - Achieves 60% improvement over non-NBTI aware ALU • Tradeoff!! 60%
TDDB – A quick guide • Gate dielectric wears down over time due to electric field and failure occurs when there is a short through the gate oxide • Ultra-thin gate oxide breakdown is highly dependent on temperature, but also dependent on Vgs 70%
Circuit-level Reliability Simulation We : • Use Pin to get a set of inputs used when running gzip and use those inputs to find an input pattern based on the samples taken from Pin • Use Cadence Spectresimulator • Use Kogge-Stone adder circuit for ALU • Use average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence framework • Extract Vgs from every device in Kogge-Stone adder 80%
Scaling Effect We : • Measured Vgs, but temperature needs to be investigated. 95%
Conclusion • For some problems like TDDB, the Architecture / Circuit level simulation gap is almost nonexistent • For other problems like NBTI, the Architecture / Circuit level simulation gap is significant and combining both approaches can yield better designs 100%
Thank you Questions ?