230 likes | 489 Views
Online Timing Analysis for Wearout Detection. Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke University of Michigan. Wearout Mechanisms. There are a lot of them: Electromigration (EM) Time-dependent dielectric breakdown (TDDB) Negative-bias threshold inversion (NBTI)
E N D
Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke University of Michigan 1
Wearout Mechanisms • There are a lot of them: • Electromigration (EM) • Time-dependent dielectric breakdown (TDDB) • Negative-bias threshold inversion (NBTI) • Hot carrier injection (HCI) • … • All highly dependent on temperature and current density • Both increasing fast! 2
Goals of this Research • Low-cost reliable system design • How do physical wearout mechanisms progress • How to determine that a device has failed • How do we maintain operation given failed components 3
Traditional and Recent Approaches • Traditional detection techniques expensive • Redundant checking structures • Predictive techniques • Canary circuits • RAMP 4
Proposed Technique • Key Insight: • Degradation in silicon decrease in performance • Long incubation time followed by rapid deterioration • Examples: • TDDB: increases leakage, shifting voltage curves • EM: increases resistance • NBTI: shifts threshold voltage 5
Outline • Microprocessor model • Wearout simulation methodology • Wearout simulation results • The wearout detection unit (WDU) • WDU Analysis • Conclusion 6
Simulation Flow Step 1: Temperature and Activity Analysis Activity Trace Power Trace Temperature Trace Benchmark Synopsys VCS PrimePower HotSpot Netlist Timing Parasitics 8
Synopsys VCS Benchmark Signal Latency Data Timing Age Index Wearout Simulation MTTF Calculation Netlist Temperature Relative Wearout Factors Activity Simulation Flow • Device Delay = Original Delay * RWF * AI * RV • RWF: Relative amount of wearout for a device • AI: Performance degradation parameterized by age • RV: Random variable Step 2: Wearout Simulation 9
Simulation Flow Step 2: Wearout Simulation 10
Signal Latency (ps) Sample Mean Latency (ps) Time (years) Wearout Simulation Results 11
Exploiting Performance Degradation • Exponential moving average: • EMA = α(sample – EMAprevious) + EMAprevious 12
Trend Analysis TRIX can be used to accurately track both local and long term latency trends 13
0 1 0 1 0 1 0 1 0 0 0 1 0 1 0 Wearout Analysis Circuit TRIXl Calculation 1 input signal 1 1 Latency Sampling Prediction TRIXg Calculation 1 1 14
TRIXl Calculation TRIXg Calculation + System Integration 0 Latency Sampling Prediction 15
Dynamic Variation • Temperature • 50oC ~4% increase in latency at 130nm • Clock jitter • Impact on latency varies • Mean jitter typically modeled as 0 • Worst-case variation would need to be sampled 12 times over 4 days 16
WDU Prediction Results • Each unit calibrated for a 30 year MTTF • The WDU flagged at least one output from each module prior to the MTTF 18
Conclusion • Low-cost reliable system design • Physical wearout mechanisms affect timing • Failure prediction can be much cheaper than detection • Wearout detection unit: • Online timing analysis a good detector of wearout, predictor of failure • Generic/self calibrating 20
OR1200 Power Densities Technology Scaling • Quickly shrinking feature sizes • Sharp increase in frequency • Slow decrease in supply voltage 22