1 / 49

Exponential Challenges, Exponential Rewards— The Future of Moore’s Law

Exponential Challenges, Exponential Rewards— The Future of Moore’s Law. Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004. ISSCC 2003— Gordon Moore said…. “No exponential is forever… But We can delay Forever”. Outline. The exponential challenges

bibiane
Download Presentation

Exponential Challenges, Exponential Rewards— The Future of Moore’s Law

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exponential Challenges, Exponential Rewards—The Future of Moore’s Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004

  2. ISSCC 2003—Gordon Moore said… “No exponential is forever… But We can delay Forever”

  3. Outline • The exponential challenges • Circuit and mArch solutions • Major paradigm shifts in design • Integration & SOC • The exponential reward • Summary

  4. How do you get there? Goal: 1TIPS by 2010 Pentium® 4 Architecture Pentium® Pro Architecture Pentium® Architecture 486 386 286 8086

  5. GATE DRAIN SOURCE BODY Technology Scaling GATE Xj DRAIN SOURCE D Tox BODY Leff Technology has scaled well, will it in the future?

  6. Technology Outlook

  7. I ≠0 I = 0 On I = 1ma/u I = ∞ I = 0 I ≠0 Off I = 0 I ≠ 0 Sub-threshold Leakage Is Transistor a Good Switch?

  8. Exponential Challenge #1 Subthreshold Leakage

  9. Sub-threshold Leakage Assume: 0.25mm, Ioff = 1na/m 5X increase each generation at 30ºC Sub-threshold leakage increases exponentially

  10. SD Leakage Power SD leakage power becomes prohibitive

  11. Leakage Power A. Grove, IEDM 2002 Leakage power limits Vt scaling

  12. Exponential Challenge #2 Gate Leakage

  13. 90nm MOS Transistor 50nm Gate 1.2 nm SiO2 Silicon substrate Gate Oxide is Near Limit If Tox scaling slows down, then Vdd scaling will have to slow down High-K dielectric is crucial

  14. Exponential Challenge #3 Energy & Power

  15. Energy per Logic Operation Energy per logic operation scaling will slow down

  16. The Power Crisis Business as usual is not an option

  17. Exponential Challenge #4 Variations

  18. Lithography Wavelength 365nm 248nm 193nm 180nm 130nm Gap 90nm 65nm 45nm Generation 32nm 13nm EUV Random Dopant Fluctuations Source: Mark Bohr, Intel Sub-wavelength Lithography Heat Flux (W/cm2) Results in Vcc variation Temperature Variation (°C) Hot spots Sources of Variations

  19. Low Freq Low Isb High Freq High Isb High Freq Medium Isb Frequency & SD Leakage 0.18 micron ~1000 samples 30% 20X

  20. High Freq Medium Isb Low Freq Low Isb High Freq High Isb Vt Distribution 0.18 micron ~1000 samples ~30mV

  21. Exponential Challenge #5 Platform & System

  22. 3000 2500 2000 System Volume ( cubic inch) 1500 1000 500 0 PC tower Mini tower m-tower Slim line Small pc 1.5 100 Pentium ® III 75 Projected Heat Dissipation Volume 1.0 Pentium ® 4 Heat-Sink Volume (in3) Thermal Budget (oC/W) 50 Air Flow Rate (CFM) Projected Air Flow Rate 0.5 25 Thermal Budget 0 0 250 0 50 100 150 200 Power (W) Platform Requirements Shrinking volume Quieter Yet, High Performance Thermal budget decreasing Higher heat sink volume Higher air flow rate

  23. Exponential Challenge #6 Economics

  24. Litho Cost FAB Cost G. Moore ISSCC 03 www.icknowledge.com $ per Transistor $ per MIPS Exponential Costs

  25. Product Cost Pressure Shrinking ASP, and shrinking $ budget for power

  26. Must Fit in Power Envelope ) 1400 2 SiO2 Lkg 10 mm Die 1200 SD Lkg Active 1000 800 Power (W), Power Density (W/cm 8 MB 600 4 MB 400 2 MB 1 MB 200 0 90nm 65nm 45nm 32nm 22nm 16nm Technology, Circuits, and Architecture to constrain the power

  27. Tox scaling will slow down—may stop? Vdd scaling will slow down—may stop? Vt scaling will slow down—may stop? Approaching constant Vdd scaling Energy/logic op will not scale Some Implications

  28. The Gigascale Dilemma • 1B T integration capacity will be available • But could be unusable due to power • Logic T growth will slow down • Transistor performance will be limited Solutions • Low power design techniques • Improve design efficiency—Multi everywhere • Valued performance by even higher integration (of potentially slower transistors)

  29. Solutions Power—active and leakage Variations Microarchitecture

  30. Slow Fast Slow High Supply Voltage Multiple Supply Voltages Low Supply Voltage Replicated Designs Vdd Vdd/2 Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = 0.125 Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Logic Block Logic Block Logic Block Active Power Reduction

  31. Body Bias Stack Effect Sleep Transistor Vbp Vdd +Ve Logic Block Equal Loading Vbn -Ve 5-10X Reduction 2-1000X Reduction 2-10X Reduction Leakage Control

  32. power 2 2 target frequency probability 1.5 1.5 1 1 0.5 0.5 0 0 large high small low Transistor size Low-Vt usage Circuit Design Tradeoffs • Higher probability of target frequency with: • Larger transistor sizes • Higher Low-Vt usage • But with power penalty

  33. 60% # critical paths 1.4 40% 1.3 Number of dies 20% Mean clock frequency 1.2 0% 0.9 1.1 1.3 1.5 1.1 Clock frequency 1 9 17 25 # of critical paths Impact of Critical Paths • With increasing # of critical paths • Both s and m become smaller • Lower mean frequency

  34. Logic depth: 16 1.0 to Ion-s NMOS Ion PMOS Ion Delay 0.5 Ratio of s/m s m s m / / delay-s 5.6% 3.0% 4.2% 0.0 16 49 Logic depth Impact of Logic Depth Device I NMOS 40% ON PMOS 20% Number of samples (%) Delay 40% 20% 0% -16% -8% 0% 8% 16% Variation (%)

  35. 1.5 1.5 1 1 0.5 0.5 0 0 less more large small # uArch critical paths Logic depth frequency target frequency probability mArchitecture Tradeoffs • Higher target frequency with: • Shallow logic depth • Larger number of critical paths • But with lower probability

  36. 2 2 power 1.5 1.5 target frequency probability 1 1 0.5 0.5 0 0 large high small low Transistor size Low-Vt usage 1.5 1.5 1 1 0.5 0.5 frequency 0 target frequency probability 0 less more large small # uArch critical paths Logic depth Variation-tolerant Design Balance power & frequency with variation tolerance

  37. Due to variations in: Vdd, Vt, and Temp Path Delay Deterministic Deterministic Probabilistic Frequency Probabilistic 10X variation ~50% total power # of Paths # of Paths Delay Target Delay Target Leakage Power Probabilistic Design Probability Delay Deterministic design techniques inadequate in the future

  38. Today: Local Optimization Single Variable Tomorrow: Global Optimization Multi-variate Shift in Design Paradigm • Multi-variable design optimization for: • Yield and bin splits • Parameter variations • Active and leakage power • Performance

  39. Resistor Network PD & Counter Delay Bias Amplifier CUT Adaptive Body Bias--Experiment Multiple subsites Resistor Network 5.3 mm 4.5 mm 1.6 X 0.24 mm, 21 sites per die 150nm CMOS Technology 150nm CMOS Number of 21 Die frequency: Min(F1..F21) Die power: Sum(P1..P21) subsites per die 0.5V FBB to Body bias range 0.5V RBB Bias resolution 32 mV

  40. noBB ABB within die ABB 97% highest bin • For given Freq and Power density • 100% yield with ABB • 97% highest freq bin with ABB for within die variability 100% yield Adaptive Body Bias--Results 100% 60% Accepted die 20% 0% Higher Frequency 

  41. Design & mArch Efficiency Employ efficient design & mArchitectures

  42. Memory Latency CPU Cache Memory Small ~few Clocks Large 50-100ns Assume: 50ns Memory latency Cache miss hurts performance Worse at higher frequency

  43. Increase on-die Memory Large on die memory provides: Increased Data Bandwidth & Reduced Latency Hence, higher performance for much lower power

  44. Multi-threading Thermals & Power Delivery designed for full HW utilization Single Thread Full HW Utilization Wait for Mem ST Multi-Threading Wait for Mem MT1 Wait MT2 MT3 Multi-threading improves performance without impacting thermals & power delivery

  45. Chip Multi-Processing C1 C2 Cache C3 C4 • Multi-core, each core Multi-threaded • Shared cache and front side bus • Each core has different Vdd & Freq • Core hopping to spread hot spots • Lower junction temperature

  46. Special Purpose Hardware TCP Offload Engine Opportunities: Network processing engines MPEG Encode/Decode engines Speech engines 2.23 mm X 3.54 mm, 260K transistors Special purpose HW—Best Mips/Watt

  47. Special-purpose hardware  more MIPS/mm² SIMD integer and FP instructions in several ISAs Valued Performance: SOC (System on a Chip) Heterogeneous Si, SiGe, GaAs Polylithic Si Monolithic Special HW Wireline Opto-Electronics CMOS RF CPU Memory RF Dense Memory

  48. Multi-Threaded, Multi-Core Multi Threaded Era of Thread & Processor Level Parallelism Special Purpose HW Speculative, OOO Super Scalar 486 386 Era of Instruction Level Parallelism 286 8086 Era of Pipelined Architecture The Exponential Reward Multi-everywhere: MT, CMP

  49. Summary—Delaying Forever • Gigascale transistor integration capacity will be available—Power and Energy are the barriers • Variations will be even more prominent—shift from Deterministic toProbabilistic design • Improve design efficiency • Multi—everywhere, & SOC valued performance • Exploit integration capacity to deliver performance in power/cost envelope

More Related