1 / 24

Process Variation in Near-threshold Wide SIMD Architectures

Process Variation in Near-threshold Wide SIMD Architectures. Sangwon Seo 1 , Ronald G. Dreslinski 1 , Mark Woh 1 , Yongjun Park 1 , Chaitali Chakrabarti 2 , Scott Mahlke 1 , David Blaauw 1 , Trevor Mudge 1 University of Michigan 1 , Arizona State University 2. Near Threshold Computing.

ron
Download Presentation

Process Variation in Near-threshold Wide SIMD Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo1, Ronald G. Dreslinski1, Mark Woh1, Yongjun Park1, Chaitali Chakrabarti2, Scott Mahlke1, David Blaauw1, Trevor Mudge1 University of Michigan1, Arizona State University2

  2. Near Threshold Computing • Super Threshold • high performance • high energy consumption • Near Threshold • 10x energy reduction • 10x performance degradation • Sub Threshold • exponentially decreasing performance • increasing leakage becomes dominant

  3. Near-threshold Computing • Advantage: High energy efficiency • Disadvantage • Low performance throughput • Compensated with very wide SIMD architecture • Sensitive to variations in threshold voltage • More critical issues in wide SIMD architectures • Increased probability of timing errors • Expensive error recovery mechanisms

  4. Near-threshold Computing • Advantage: High energy efficiency • Disadvantage • Low performance throughput • Compensated with very wide SIMD architecture • Sensitive to variations in threshold voltage • More critical issues in wide SIMD architectures • Increased probability of timing errors • Expensive error recovery mechanisms • How bad is the delay variation in wide SIMD architectures running at near-threshold voltages? • How to mitigate the variation-induced timing errors?

  5. Delay Variations in 90nm ~2.3x ~1.6x Uncorrelated variations are averaged out over the chain.

  6. Delay Variations – f(Vdd=0.55V, N) A long chain helps, but the effect diminishes as N increases. Variations are exacerbated with technology scaling.

  7. Delay Variations – f(Vdd, N=50) LER causes high variations in advanced technology nodes Strict Design Rules Metal-Gates w/ high-k material or SOI Advanced lithography

  8. Delay Distribution – 90nm GP Performance Drop • 1 critical path delay = delay of a chain of 50 FO4 inverters. • 1-wide system delay = max (delays of 100 critical paths ) • 128-wide system delay = max (delays of 128 1-wide system)

  9. Variation Effects on 128-wide SIMD Architecture - Structural Duplication - Voltage margining - Frequency margining

  10. Near-threshold Wide SIMD Architecture: Diet SODA [Seo et al. ISLPED 2010]

  11. Structural Duplication • Increase number of processing resources Crossbar SIMD Function Unit #9 8-wide+2-spare system SIMD Function Unit #8 Datapath#7 SIMD Function Unit #7 Datapath#6 SIMD Function Unit #6 Datapath#5 SIMD Function Unit #5 Datapath#4 SIMD Function Unit #4 Datapath#3 SIMD Function Unit #3 Datapath#2 SIMD Function Unit #2 Datapath#1 SIMD Function Unit #1 Datapath#0 SIMD Function Unit #0

  12. Structural Duplication • Use the spares if required. Crossbar SIMD Function Unit #9 8-wide+2-spare system SIMD Function Unit #8 Datapath#6 SIMD Function Unit #7 Datapath#6 SIMD Function Unit #6 Datapath#5 SIMD Function Unit #5 Datapath#4 SIMD Function Unit #4 Datapath#3 SIMD Function Unit #3 Datapath#2 SIMD Function Unit #2 Datapath#1 SIMD Function Unit #1 Datapath#0 SIMD Function Unit #0

  13. Structural Duplication – 90nm GP 6 spares are required to match the chip delay of baseline architecture.

  14. Voltage Margining • Increase supply voltage Delay distributions: 45nm PTM model is used

  15. Frequency Margining • Increase clock period • Applicable for applications with relaxed time constraints • For advanced technology nodes, this is impractical • Caveat • Consider its impact on system • SIMD subsystem clock period (Tclk@NTV) • memory subsystem clock period (Tclk@FV)

  16. Structural Duplication vs. Voltage Margining

  17. Combination of two schemes – 45nm GP 26 spares 128-wide system @ 0.6V 17mV boost 5mV + 8 spares 10mV + 2 spares

  18. Variation-Aware Diet SODA

  19. Conclusions • Near-threshold operation of wide SIMD system can have timing problems due to process variations. • Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non-negligible for current/future technology nodes. • A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures.

  20. Questions? Thank you!

  21. Backup Slides

  22. Local Spares vs. Global Spares Local Sparing 1 out of 4 (2 spares) Global Sparing (2 spares) + small overhead - burst errors + burst errors - Large overhead

  23. Local Spares vs. Global Spares 128 + 8 global spares 128 + 32 local spares (1 out of 4) • Global sparing is better than local sparing. • XRAM crossbar supports global sparing.

  24. Variation-Aware Diet SODA • With little area and power overhead, delay variations can be solved.

More Related