1 / 29

Marc de Kruijf Shuou Nomura Karu Sankaralingam

A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism. Marc de Kruijf Shuou Nomura Karu Sankaralingam. From Hard to Harder. 10000nm. 720nm. 4000um. 360nm. 1500um. 180nm. 90nm. 45nm & beyond. Hard.

Download Presentation

Marc de Kruijf Shuou Nomura Karu Sankaralingam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism Marc de Kruijf Shuou Nomura KaruSankaralingam

  2. From Hard to Harder 10000nm 720nm 4000um 360nm 1500um 180nm 90nm 45nm & beyond Hard Harder

  3. What is the Problem? • Non-ideal transistor scaling • Transistor wear-out • Process, voltage, and temperature (PVT) variations • Errors due to particle interference • Noise coupling & crosstalk

  4. What is the Problem? Dynamic verification Multi-core Coherence & consistency Timing speculation DMR Need high-level analysis tools ECC On-chip network TMR RMT Watchdog Branch prediction Out-of-order HW checkpoints Performance Toolbox Reliability Toolbox

  5. Our Contribution • A model for timing speculation • Unifies hardware + system • Small set of high-level inputs processor designer Also…. Q. What is the impact of technology scaling? A. Further benefits are small to none. Q. What is the impact of CMOS design style? A. Very low power designs benefit most. Q. What is the impact of the fault recovery mechanism? A. Fine-grained recovery is key to high efficiencies.

  6. Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion

  7. Timing Speculation … clock clock period ( = 1/frequency ) detect & recover circuit delay variations Timing failure! … slower clock OK!

  8. Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion

  9. Model Overview Hardware Efficiency System Recovery Overall Efficiency Energy Energy Time Error rate Error rate Error rate Model Inputs 1. A hardware path delay distribution 2. Effect of variations on path delay as N(μ,σ) 3. The time between recovery checkpoints 4. The time to restore a checkpoint

  10. Hardware Efficiency Model Input 1: Path delay distribution Input 2: Path delay variation (σ) # Paths Path delay Error prob. Error prob. Clock period Clock period Energy Error rate … … Clock period Energy e.g. frequency scaling Error prob. Error prob.

  11. System Recovery Model (applies to all backward error recovery systems) ( ) overhead(rate) = failures(rate) x waste(rate) + restore System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2. The time to restore a checkpoint (restore) Time Error rate

  12. Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion

  13. Results Is the model useful? What can we learn? Technology Node Recovery System CMOS Design Style High Performance CMOS Razor 11nm 45nm Low Power CMOS Reunion Ultra-low Power CMOS Paceline

  14. Results Hardware Efficiency System Recovery Overall Efficiency Energy Energy Time Error rate Error rate Error rate

  15. Hardware Model Inputs • Path delay distribution • Application: H.264 decoding • Hardware: OpenRISC processor • Effect of process variations as N(μ,σ) using ITRS data • High Performance CMOS • 45nm σ = 0.046μ • 11nm σ = 0.051μ • Low Power CMOS • 45nm σ = 0.029μ • 11nm σ = 0.042μ • Ultra-low Power CMOS • 45nm σ = 0.196μ

  16. Hardware Efficiency Energy = Power x Time Energy EDP EDP = Power x Time2 Error rate Normalized EDP Results for High Performance CMOS Error rate

  17. Recovery Model Inputs • The time between recovery checkpoints & • The time to restore a checkpoint • Razor • Latch-level detection + pipeline rollback • 1 cycle checkpoint size & 5 cycle recovery cost • Reunion • DMR detection + checkpoint • 100 cycle checkpoint size & 100 cycle recovery cost • Paceline • DMR detection + checkpoint + flush • 100 cycle checkpoint size & 1000 cycle recovery cost

  18. System Recovery Time Error rate Normalized Time Error rate

  19. Overall Efficiency 1. High Performance CMOS 2. Low Power CMOS 3. Ultra-low Power CMOS EDP Error rate

  20. Overall Efficiency High Performance CMOS Normalized EDP 23% PEAK, 8-15% Typical Error rate

  21. Overall Efficiency Low Power CMOS Normalized EDP 18% Peak, 5-10% Typical Error rate

  22. Overall Efficiency Ultra-low Power CMOS Normalized EDP 47% Peak, 20-30% Typical Error rate

  23. Outline • Timing Speculation • Model Overview • Hardware Efficiency Model • System Recovery Model • Results • Conclusion

  24. Conclusions • A High-level Model • Results • Efficiency gains improve only minimally with scaling • Ultra-low power (sub-threshold) CMOS benefits most • Fine-grained recovery is key • Future Work • Incorporate more sources of variation • A tool for processor designers? • Under development at http://www.cs.wisc.edu/vertical

  25. Questions? Multi-core Coherence & consistency On-chip network Timing speculation Branch prediction Out-of-order

  26. ? DSN 2010 - ‹#›

  27. Timing Speculation Source of Timing Variation Manufacturing Process Runtime Application Speed Binning Online Timing Analysis Timing Speculation Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.

  28. System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2. The time to restore a checkpoint (restore) expected # failures before success expected # cycles executed upon failure

  29. Overall Inputs • Path delay distribution • Application: H.264 decoding • Hardware: OpenRISC processor • Effect of process variations on path delay as N(μ,σ) using ITRS data • High Performance CMOS @45nm σ = 0.046μ • Low Power CMOS @45nm σ = 0.029μ • Ultra-low Power CMOS @45nm σ = 0.196μ • The time between recovery checkpoints & • The time to restore a checkpoint • Razor – Latch-level detection + pipeline rollback (1 & 5 cycles) • Reunion – DMR detection + checkpoint (100 & 100 cycles) • Paceline – DMR detection + checkpoint + flush (100 & 1000 cycles)

More Related