290 likes | 435 Views
A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism. Marc de Kruijf Shuou Nomura Karu Sankaralingam. From Hard to Harder. 10000nm. 720nm. 4000um. 360nm. 1500um. 180nm. 90nm. 45nm & beyond. Hard.
E N D
A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism Marc de Kruijf Shuou Nomura KaruSankaralingam
From Hard to Harder 10000nm 720nm 4000um 360nm 1500um 180nm 90nm 45nm & beyond Hard Harder
What is the Problem? • Non-ideal transistor scaling • Transistor wear-out • Process, voltage, and temperature (PVT) variations • Errors due to particle interference • Noise coupling & crosstalk
What is the Problem? Dynamic verification Multi-core Coherence & consistency Timing speculation DMR Need high-level analysis tools ECC On-chip network TMR RMT Watchdog Branch prediction Out-of-order HW checkpoints Performance Toolbox Reliability Toolbox
Our Contribution • A model for timing speculation • Unifies hardware + system • Small set of high-level inputs processor designer Also…. Q. What is the impact of technology scaling? A. Further benefits are small to none. Q. What is the impact of CMOS design style? A. Very low power designs benefit most. Q. What is the impact of the fault recovery mechanism? A. Fine-grained recovery is key to high efficiencies.
Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion
Timing Speculation … clock clock period ( = 1/frequency ) detect & recover circuit delay variations Timing failure! … slower clock OK!
Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion
Model Overview Hardware Efficiency System Recovery Overall Efficiency Energy Energy Time Error rate Error rate Error rate Model Inputs 1. A hardware path delay distribution 2. Effect of variations on path delay as N(μ,σ) 3. The time between recovery checkpoints 4. The time to restore a checkpoint
Hardware Efficiency Model Input 1: Path delay distribution Input 2: Path delay variation (σ) # Paths Path delay Error prob. Error prob. Clock period Clock period Energy Error rate … … Clock period Energy e.g. frequency scaling Error prob. Error prob.
System Recovery Model (applies to all backward error recovery systems) ( ) overhead(rate) = failures(rate) x waste(rate) + restore System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2. The time to restore a checkpoint (restore) Time Error rate
Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion
Results Is the model useful? What can we learn? Technology Node Recovery System CMOS Design Style High Performance CMOS Razor 11nm 45nm Low Power CMOS Reunion Ultra-low Power CMOS Paceline
Results Hardware Efficiency System Recovery Overall Efficiency Energy Energy Time Error rate Error rate Error rate
Hardware Model Inputs • Path delay distribution • Application: H.264 decoding • Hardware: OpenRISC processor • Effect of process variations as N(μ,σ) using ITRS data • High Performance CMOS • 45nm σ = 0.046μ • 11nm σ = 0.051μ • Low Power CMOS • 45nm σ = 0.029μ • 11nm σ = 0.042μ • Ultra-low Power CMOS • 45nm σ = 0.196μ
Hardware Efficiency Energy = Power x Time Energy EDP EDP = Power x Time2 Error rate Normalized EDP Results for High Performance CMOS Error rate
Recovery Model Inputs • The time between recovery checkpoints & • The time to restore a checkpoint • Razor • Latch-level detection + pipeline rollback • 1 cycle checkpoint size & 5 cycle recovery cost • Reunion • DMR detection + checkpoint • 100 cycle checkpoint size & 100 cycle recovery cost • Paceline • DMR detection + checkpoint + flush • 100 cycle checkpoint size & 1000 cycle recovery cost
System Recovery Time Error rate Normalized Time Error rate
Overall Efficiency 1. High Performance CMOS 2. Low Power CMOS 3. Ultra-low Power CMOS EDP Error rate
Overall Efficiency High Performance CMOS Normalized EDP 23% PEAK, 8-15% Typical Error rate
Overall Efficiency Low Power CMOS Normalized EDP 18% Peak, 5-10% Typical Error rate
Overall Efficiency Ultra-low Power CMOS Normalized EDP 47% Peak, 20-30% Typical Error rate
Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion
Conclusions • A High-level Model • Results • Efficiency gains improve only minimally with scaling • Ultra-low power (sub-threshold) CMOS benefits most • Fine-grained recovery is key • Future Work • Incorporate more sources of variation • A tool for processor designers? • Under development at http://www.cs.wisc.edu/vertical
Questions? Multi-core Coherence & consistency On-chip network Timing speculation Branch prediction Out-of-order
? DSN 2010 - ‹#›
Timing Speculation Source of Timing Variation Manufacturing Process Runtime Application Speed Binning Online Timing Analysis Timing Speculation Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.
System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2. The time to restore a checkpoint (restore) expected # failures before success expected # cycles executed upon failure
Overall Inputs • Path delay distribution • Application: H.264 decoding • Hardware: OpenRISC processor • Effect of process variations on path delay as N(μ,σ) using ITRS data • High Performance CMOS @45nm σ = 0.046μ • Low Power CMOS @45nm σ = 0.029μ • Ultra-low Power CMOS @45nm σ = 0.196μ • The time between recovery checkpoints & • The time to restore a checkpoint • Razor – Latch-level detection + pipeline rollback (1 & 5 cycles) • Reunion – DMR detection + checkpoint (100 & 100 cycles) • Paceline – DMR detection + checkpoint + flush (100 & 1000 cycles)