250 likes | 435 Views
Process Variation in Near-threshold Wide SIMD Architectures. Sangwon Seo 1 , Ronald G. Dreslinski 1 , Mark Woh 1 , Yongjun Park 1 , Chaitali Chakrabarti 2 , Scott Mahlke 1 , David Blaauw 1 , Trevor Mudge 1 University of Michigan 1 , Arizona State University 2. Near Threshold Computing.
E N D
Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo1, Ronald G. Dreslinski1, Mark Woh1, Yongjun Park1, Chaitali Chakrabarti2, Scott Mahlke1, David Blaauw1, Trevor Mudge1 University of Michigan1, Arizona State University2
Near Threshold Computing • Super Threshold • high performance • high energy consumption • Near Threshold • 10x energy reduction • 10x performance degradation • Sub Threshold • exponentially decreasing performance • increasing leakage becomes dominant
Near-threshold Computing • Advantage: High energy efficiency • Disadvantage • Low performance throughput • Compensated with very wide SIMD architecture • Sensitive to variations in threshold voltage • More critical issues in wide SIMD architectures • Increased probability of timing errors • Expensive error recovery mechanisms
Near-threshold Computing • Advantage: High energy efficiency • Disadvantage • Low performance throughput • Compensated with very wide SIMD architecture • Sensitive to variations in threshold voltage • More critical issues in wide SIMD architectures • Increased probability of timing errors • Expensive error recovery mechanisms • How bad is the delay variation in wide SIMD architectures running at near-threshold voltages? • How to mitigate the variation-induced timing errors?
Delay Variations in 90nm ~2.3x ~1.6x Uncorrelated variations are averaged out over the chain.
Delay Variations – f(Vdd=0.55V, N) A long chain helps, but the effect diminishes as N increases. Variations are exacerbated with technology scaling.
Delay Variations – f(Vdd, N=50) LER causes high variations in advanced technology nodes Strict Design Rules Metal-Gates w/ high-k material or SOI Advanced lithography
Delay Distribution – 90nm GP Performance Drop • 1 critical path delay = delay of a chain of 50 FO4 inverters. • 1-wide system delay = max (delays of 100 critical paths ) • 128-wide system delay = max (delays of 128 1-wide system)
Variation Effects on 128-wide SIMD Architecture - Structural Duplication - Voltage margining - Frequency margining
Near-threshold Wide SIMD Architecture: Diet SODA [Seo et al. ISLPED 2010]
Structural Duplication • Increase number of processing resources Crossbar SIMD Function Unit #9 8-wide+2-spare system SIMD Function Unit #8 Datapath#7 SIMD Function Unit #7 Datapath#6 SIMD Function Unit #6 Datapath#5 SIMD Function Unit #5 Datapath#4 SIMD Function Unit #4 Datapath#3 SIMD Function Unit #3 Datapath#2 SIMD Function Unit #2 Datapath#1 SIMD Function Unit #1 Datapath#0 SIMD Function Unit #0
Structural Duplication • Use the spares if required. Crossbar SIMD Function Unit #9 8-wide+2-spare system SIMD Function Unit #8 Datapath#6 SIMD Function Unit #7 Datapath#6 SIMD Function Unit #6 Datapath#5 SIMD Function Unit #5 Datapath#4 SIMD Function Unit #4 Datapath#3 SIMD Function Unit #3 Datapath#2 SIMD Function Unit #2 Datapath#1 SIMD Function Unit #1 Datapath#0 SIMD Function Unit #0
Structural Duplication – 90nm GP 6 spares are required to match the chip delay of baseline architecture.
Voltage Margining • Increase supply voltage Delay distributions: 45nm PTM model is used
Frequency Margining • Increase clock period • Applicable for applications with relaxed time constraints • For advanced technology nodes, this is impractical • Caveat • Consider its impact on system • SIMD subsystem clock period (Tclk@NTV) • memory subsystem clock period (Tclk@FV)
Combination of two schemes – 45nm GP 26 spares 128-wide system @ 0.6V 17mV boost 5mV + 8 spares 10mV + 2 spares
Conclusions • Near-threshold operation of wide SIMD system can have timing problems due to process variations. • Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non-negligible for current/future technology nodes. • A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures.
Questions? Thank you!
Local Spares vs. Global Spares Local Sparing 1 out of 4 (2 spares) Global Sparing (2 spares) + small overhead - burst errors + burst errors - Large overhead
Local Spares vs. Global Spares 128 + 8 global spares 128 + 32 local spares (1 out of 4) • Global sparing is better than local sparing. • XRAM crossbar supports global sparing.
Variation-Aware Diet SODA • With little area and power overhead, delay variations can be solved.