330 likes | 444 Views
Online Timing Variation Tolerance for Digital Integrated Circuits. Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS). Sources of timing variation. PVT variation
E N D
Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Sources of timing variation • PVT variation • Dynamic: Voltage & Temperature fluctuations • Static: Process variation • Aging degradation • NBTI, PBTI • TDDB • Soft errors (in non-regular logics) • SEU & SET
Process variation • Sub-wavelength Lithography • “What you get is not what you want” • Systematic • Random dopant fluctuations • Vth variation • Random P variation is time-independent, “DC component” Max Freq. differentiate by 20% ! [Teodorescu, ISCA’08]
Temperature variation • Application-specific • Slow-varying • Milliseconds • Typical thermal constant : 2ms [Donald, ISCA’06] T variation is slow-varying, “Low-frequency components”
Voltage variation • Fast-changing • Inductive noise • a.k.a. L(di/dt) problem • IR-drop Why it is harder to keep a constant voltage level ? Example: Power budget: 100W, Working voltage: 1V, Current: 100A, To keep voltage fluctuation between ±5%, RPDN < 0.5 mOhm V variation is fast-changing, “High-frequency components” PDN hierarchy model
Aging degradation • Aging mechanisms • NBTI (PMOS) • PBTI (NMOS) • TDDB Failure rate Aging Infant mortality Useful time Lifetime 20%degradation 10years
Soft errors • SEU (Single Event Upset) • Unintentional bit-flip in storage cells • SET (Single Event Transient) • Transient voltage pulse propagating in combinational logics SEU SET
Outline • TEA-TM • Timing emergency-aware thread migration • PVT variations co-optimization • SVFD • Stability violation based fault detection • On-line fault detection via timing sensing • Delay fault, aging delay, soft errors • MicroFix • Margin-reducing with timing sensing • Application to DVFS • ReviveNet • Aging-delay tolerance
TEA-TM: Timing Emergency-Aware Thread Migration • Focus on the essential Timing issue Process variation Voltage variation Temperature variation Timing variation (+, -) (+, -) (+, -) Not Necessarily aggregated, but can cancel off each others in some cases. Hence,“Complementary”.
Some terms Timing Emergency Delay • Timing emergency (TE) • Emergency level (EL) • “Density” of TE • Define: EL = # of TE per 100 millions cycles Threshold Time Voltage Temperature Process Slow corner Violent Large fluctuation Hot Mild Fast corner Small fluctuation Cool
How PVT Variations Complement each other ? • Observation in time domain Excessive headroom T. Mild, V. Mild Threshold Core1: Delay T Mild, V Mild T Mild, V Violent Large margin, low EL Time Emergency T. Violent, V. Violent Delay Core2: T Violent, V Mild T Violent, V Violent Little margin, High EL Time What if exchange the threads on Core1 and Core2? Mild + Violent
Frequency domain analysis Migrate threads = “Graft”V component
Frequency domain analysis (cont.) • Relative frequency spectrum deviations on 2GHz quad-core processor. • P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz. • Potential • Core3 and Core4 are mild • Strategy • exchange threads on Core1 and Core4, Core2 and Core 3
TEA-TM Summary • Analyzing the complementary effect • from both time and frequency domain • Presenting a delay sensor-based scheme (TEA-TM) to exploit the complementary effect • Simple, cost-efficient • FFT-like heuristic Throughput: 30% Fairness: 80% Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp.485-496, Jun. 2010.
Stability Violation • Stable Period vs. Variable Period Stability Violation: Signal transitions occur in Stable Period.
In what situations would SVs occur? • Delay faults resulting from • Delay defects (introduced in manufacturing processes) • Aging (Wearout) induced performance degradation Setup time Setup time violation Due to Delay Fault T T • Thus, delay faults caused stability violation do not differ too much from “setup time violation” YES! • But, Can soft error be modeled by SV?
How do Soft Errors cause SV? Si violates Stability Requirement! SEU SET So violates Stability Requirement! Notice: NOLY the SVs occurring in “vulnerable window” --- within which the flip-flops are updated --- could cause failures.
The next problem is Delay faults and soft errors can be modeled as Stability Violations. • How to detect stability violations? • Low cost stability checker
Some Rresults • Implementation • SVFD-protected FPU • Using 65nm PTM, Hspice Simulation • A Unified Online Fault Detection Scheme via Checking of Stability Violation • Guihai Yan, Yinhe Han, Xiaowei Li, • IEEE/ACM Desing, Automation and Test in Europe (DATE’09), pp.496-501, 2009. • SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation • Guihai Yan, Yinhe Han, Xiaowei Li, • IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), 19(9), Sep. 2011.
Besides of fault detection, what else can we do with SVFD? • Dynamic margin reduction • MicroFix: an application to DVFS • Aging tolerance • ReviveNet: Fine-grained aging delay tolerance
Dynamic margin reduction • Timing sensors setup
Fine-grained margin exploited Localized timing imbalance Generous Flip-flop (GFF) Forward Adaptable Flip-flop (FAFF) Backward Adaptable Flip-flop (BAFF) Unadaptable Flip-flop (UAFF)
Case study results • Apply to a FPU • 32nm PTM models TH=0.2~0.3 is an optimal choice! Efficiency Improvement: 35% EDP MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(2), 1-21, 2011. MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’09), pp395-400, 2009.
Localized Aging Tolerance • The chance for aging adaptation • We have chance to “act before it’s too late”
Nudge for timing margin • Dynamic time borrowing • Path-grained, NOT stage-grained
Aging sensors setup • Coarse-grained detection
Trail-based adaptation • Fine-grained adaptation Adaptation latency is non-critical Trail till success
Implementation • False-alarm filter • Sharing filters to reduce overhead ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE Transactions on Computers (TC), 60(9), Sep. 2011.
Conclusion • Dynamic timing variation is increasingly critical • Online timing variation detection and tolerance is a promising approach to dynamic variation • Application-specific timing variation • MicroFix for DVFS • ReviveNet for aging tolerance • Holistic solution can be more cost-effective • TEA-TM • Architectural optimization for Circuit symptom
Publication (Chronological order) • Guihai Yan, Yinhe Han, Xiaowei Li, ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation, IEEE Transactions on Computers (TC), Vol.60, No.9, pp.1219-1232, Sep. 2011. • Guihai Yan, Yinhe Han, Xiaowei Li, SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation, IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), Vol.19, No.9, pp.1627-1640, Sep. 2011. • Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction, ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol.16, No.2, pp.1-21, Mar. 2011. • Jianbo Dong, Lei Zhang, Yinhe Han, Guihai Yan, Xiaowei Li, Performance-asymmetry-aware Scheduling for Chip Multiprocessors with Static Core Coupling, Journal of Systems Architecture, Vol.56, pp.534-542, 2010. • Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp.485-496, Jun. 2010. • Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'09), pp.395-400, 2009. • Song Jin, Yinhe Han, Lei Zhang, Huawei Li , Xiaowei Li and Guihai Yan, M-IVC: Using Multiple Input Vectors to Minimize Aging-induced Delay, Proc. of IEEE Asian Test Symposium (ATS'09), 2009. • Guihai Yan, Yinhe Han, Xiaowei Li, A Unified Online Fault Detection Scheme via Checking of Stability Violation, IEEE/ACM Desing, Automation and Test in Europe (DATE'09), pp.496-501, 2009. • Guihai Yan, Yinhe Han, Xiaowei Li, Hui Liu, BAT: Performance-Driven Crosstalk Mitigation Based on Bus-grouping Asynchronous Transmission, IEICE Transactions On Electronics, Vol.E91-C, No.10, pp.1690-1697, Oct, 2008.
Book Chapters • Fault Tolerance Designs for Digital Integrated Circuits: Tolerating defects/faults, parameter variations, and soft errors (in Chinese), Beijing, Science Press, 2011. ISBN 978-7-03-030576-3.