2.86k likes | 3k Views
Design challenges in sub-100nm high performance microprocessors. Nitin Borkar, Siva Narendra, James Tschanz, Vasantha Erraguntla Circuit Research, Intel Labs nitin.borkar@intel.com siva.g.narendra@intel.com james.w.tschanz@intel.com vasantha.erraguntla@intel.com. ®. Outline.
E N D
Design challenges in sub-100nm high performance microprocessors Nitin Borkar, Siva Narendra, James Tschanz, Vasantha Erraguntla Circuit Research, Intel Labs nitin.borkar@intel.com siva.g.narendra@intel.com james.w.tschanz@intel.com vasantha.erraguntla@intel.com ®
Outline • Section 1: Challenges for low power and high performance (90 mins) • Historical device and system scaling trends • Sub-100nm device scaling challenges • Power delivery and dissipation challenges • Power efficient design choices • Section 2a: Circuit techniques for variation tolerance (90 mins) • Short channel effects • Adaptive circuit techniques for variation tolerance
Outline (contd.) • Section 2b: Circuit techniques for leakage control (90 mins) • Leakage power components • Leakage power prediction • Leakage reduction and control techniques • Section 3: Full-chip power reduction techniques (90 mins) • Micro-architecture innovations • Coding techniques for interconnect power reduction • CMOS compatible dense memory design • Special purpose hardware • Design methodologies & challenges for CAD
Section 1 Challenges for low power and high performance
Scaling of dimensions 1 1 1 0.49 0.7 0.7
Transistors on a chip 1000 2X growth in 1.96 years! 100 Pentium 4 Pentium III 10 Pentium II Pentium Transistors (MT) 486 1 386 0.1 286 8086 8085 0.01 8080 8008 4004 0.001 1970 1980 1990 2000 2010 Year Transistors on Lead Microprocessors double every 2 years
Die size growth 100 Pentium 4 Pentium III Pentium II Pentium 486 Die size (mm) 10 386 286 8080 8086 ~7% growth per year 8085 8008 ~2X growth in 10 years 4004 1 1970 1980 1990 2000 2010 Year Die size grows by 14% to satisfy Moore’s Law
Frequency Doubles every 2 years 10000 Pentium 4 1000 Pentium III Pentium II 100 Pentium Frequency (Mhz) 486 386 10 8085 286 8086 8080 1 8008 4004 0.1 1970 1980 1990 2000 2010 Year Lead Microprocessors frequency doubles every 2 years
Performance Applications will demand TIPS performance
Power Future 100 Pentium 4 Pentium III Pentium 10 486 286 8086 Power (Watts) 386 8085 1 8080 8008 4004 0.1 1971 1974 1978 1985 1992 2000 Year Lead Microprocessors power continues to increase
Obeying Moore’s Law... 10000 1.8B 1000 900M 425M 100 200M Pentium 4 10 Pentium II Transistors (MT) Pentium 486 1 386 0.1 286 8086 8085 0.01 8080 8008 4004 0.001 1970 1980 1990 2000 2010 Year 200M--1.8B transistors on the Lead Microprocessor
Vcc will continue to reduce 10.00 1.35 1 1.00 Supply Voltage (V) 1.15 0.9 0.10 1970 1980 1990 2000 2010 Year Only 15% Vcc reduction to meet frequency demand
Constant Electric Field Scaling 5 4 3 Oxide Field (MV/cm) 2 1 0 1.5 1.2 1.0 0.8 0.6 0.35 0.25 0.18 Technology Dimension (um)
Active capacitance density Active capacitance grows 30-35% each technology generation
Power will be a problem 100000 18KW 5KW 10000 1.5KW 500W 1000 P4 100 P III Power (Watts) Pentium 486 286 10 386 8086 8085 8080 1 8008 4004 0.1 1974 1978 1985 1992 2000 2004 2008 1971 Year Power delivery and dissipation will be prohibitive
Closer look at the power 100,000 Will be... 18KW 10,000 5KW Should be... Power (Watts) 1.5KW 623W 1,000 500W 375W 225W 135W 100 2002 2004 2006 2008 Year
Advanced transistor design Shallow highly doped source/drain extension Thin TOX p+ p+ Halo/pocket Retrograde Well Shallow trench isolation n-well Deep source/drain
Intel’s 15 nm bulk transistor R. Chau et al., IEDM 2000
Transistor scaling trends - SCE Uniform doping Retrograde doping 7.0 6.0 5.0 MOSFET Aspect Ratio (lateral/vertical) 4.0 3.0 2.0 1.0 0.25 0.18 0.13 0.10 0.07 Technology Generation (um) Short channel effect (SCE) as measured as aspect ratio has been worsening with scaling Le Tox Dj D Aspect Ratio:
Transistor scaling challenges - Dj • Junction depth reduction: + Device channel length decrease for same SCE - Series resistance to the channel increases
Transistor scaling challenges - Tox • Thinning gate oxide • Increased gate tunneling leakage • Electrical thickness is ~2X physical thickness • Gate stress now limits max VCC • Solutions • New decoupling caps • Modified oxides/gate materials • Model gate leakage in circuit simulation
Vcc scaling & Soft errors • Vcc and cap scaling with technology reduces charge stored • Soft errors prominent in logic circuits • No error correction in logic circuits • Storage nodes per chip increasing • Higher soft errors at the chip level
Motivation SER A diff µ bit C V gate cc • Soft error rate (SER) per bit staying constant in future processes • T. Karnik et al, 2001 VLSI Circuits Symposium • Need to reduce SER/bit Goal: Reduce chip-level SER with no performance penalty and minimum power penalty
Measured Latch Data Will need ~2X SER improvement in latches with no performance loss. SERX 2.25 7,000 2 5,250 Original Errors 3,500 SER ImprovementX 1,750 Hardened 0 1 0.5 0.7 0.9 1.1 1.3 Supply Voltage (V) T. Karnik et al, 2001 VLSI Circuits Symposium
VT vs. leakage Leakage rises as the VT is lowered • MOS has a sub-threshold slope of ~110mV/decade • Lower VT by 50mV 3X leakage Solutions • Dual VT • Stacking of off gates • Controlled back gate bias? • Multiple process technologies: Mobile vs. Performance?
Sub-threshold Leakage MOS Transistor Characteristics 10000 DVt 1000 Ids (log) Ioff (na/u) 100 Exponential Increase in Ioff 10 1 Vgs 30 50 70 90 110 130 Temp (C) Sub-threshold leakage current will increase exponentially Assumtions: 0.25mm, Ioff = 1na/m 5X increase each generation at 30ºC
Leakage Power Excessive sub-threshold leakage power
Leakage Power increases 50% 8KW 40% 1.7KW 30% Drain Leakage Power 400W 88W 20% 12W 10% 0% 2000 2002 2004 2006 2008 Year 100,000 0.18u 0.13u 0.1u 0.07u 0.05u 10,000 1,000 Ioff (na/u) 100 10 30 40 50 60 70 80 90 100 Temp (C) Drain leakage will have to increase to meet freq demand Results in excessive leakage power
Wide Domino Functionality CLK CLK Q2 Q1 A B C B C Static Gate D2 Domino Gate CLK D1 Domino Gate • Lower AC noise margin ~ Vt • Ioff could limit NOR fan-in • High activity, higher power, ~2X • Irreversible logic evaluation • Scalability is not good • High performance ~30% over static • High fan-in NOR, less logic gates • High fan-in complex gates possible • Smaller area
Bitline Delay Scaling Problem 1.2 Logic circuit delay Bit line delay (15% swing scaling) 1 Bit line delay (const swing) 0.8 Normalized delay 0.6 0.4 0.2 0 0.25 0.18 0.13 0.10 Technology generation (um) • Bit line swing limited by parameter mismatch & differential noise • Cell stability degrades with Vt lowering • Bit line delay a(Cap/W)*Vswing/(Ion/W - #rows*Ioff/W) • Reducing # of rows per bitline approaching limit
Restrict transistor leakage 10000 7 GHz 5.5 GHz 4 GHz 2.5 Ghz 1000 Pentium 4 Frequency (Mhz) Pentium II 100 Pentium 486 386 10 1985 1990 1995 2000 2005 2010 Year Reduce leakage Frequency will not double every 2 years
Interconnect performance R increases faster at lower levels C increases faster at higher levels RC increases ~40-60%
Interconnect distribution Interconnect distribution does not change significantly
Wire Scaling • Tall wires to reduce R • thickness to width ratios of 2 to 1 • large cross cap • Uarch for short wires • Repeaters
Optimum Repeater P size 2 N size 2 Repeater distance 1 • Best speed at • space ~2X width • Include metal thickness and optimize for PD3 • thickness ~2X width Pitch • Vary • N size, P size • Repeater distance • Metal width, space
P, V, T Variations Voltage Process • Chip activity change • Current delivery—RLC • Dynamic: ns to 10-100us • Within-die variation • Die-to-die variation • Within-die variation • Static for each die Very slow Device Ion Temperature Years • Activity & ambient change • Dynamic: 100-1000us • Within-die variation Time dependent degradation
Frequency & SD Leakage 1.4 1.3 1.2 1.1 1.0 0.9 0 5 10 Low Freq Low Isb High Freq High Isb High Freq Medium Isb Normalized Frequency 0.18 micron ~1000 samples 30% 20X 15 20 Normalized Leakage (Isb)
Vt Distribution High Freq Medium Isb Low Freq Low Isb High Freq High Isb 120 0.18 micron ~1000 samples 100 80 ~30mV # of Chips 60 40 20 0 -39.71 -25.27 -10.83 3.61 18.05 32.49 D VTn(mv)
Frequency Distribution High Freq Medium Isb Low Freq Low Isb High Freq High Isb 150 100 # of Chips 50 0 1.37 1.30 1.22 1.15 1.07 1.00 Freq (Normalized)
Isb Distribution High Freq Medium Isb Low Freq Low Isb High Freq High Isb 100 # of Chips 1 20.11 16.29 12.47 8.64 4.82 1.00 Isb (Normalized)
Supply Voltage Variation Reliability & power Vmax Supply voltage (V) Vmin frequency Time (msec) • Activity changes • Current delivery RI and L(di/dt) drops • Dynamic: ns to 10-100us • Within-die variation
Handling di/dt • Land-side / package capacitors • High frequency or local VRMs • Low leakage on-die capacitors Bulk Decoupling High Frequency Decoupling VRM Response Local Decoupling Silver BoxResponse On DieDecoupling
Vcc Variation Reduction • On die decoupling capacitors reduce DVcc • Cost area, and gate oxide leakage concerns • On die voltage down converters & regulators
Temperature Variation Cache 70ºC Core 120ºC • Activity & ambient change • Dynamic: 100-1000us • Within-die variation
Major Paradigm Shift • From deterministic design to probabilistic and statistical design • A path delay estimate is probabilistic (not deterministic) • Multi-variable design optimization for • Yield and bin splits • Parameter variations • Active and leakage power • Performance
Performance Efficiency of mArch Pollack’s Rule 4 3 Area(Lead / Compaction) 2 Growth (X) Performance(Lead / Compaction) 1 *Note: Performance measured using SpecINT and SpecFP 0 1.5 1 0.7 0.5 0.35 0.18 Technology Generation • Implications (in the same technology) • New microarchitecture ~2-3X die area of the last uArch • Provides 1.5-1.7X performance of the last uArch We are on the wrong side of a Square Law
Frequency & Performance 100 Additional due to Pentium® 4 proc uArch 7X Perf due to Freq Relative Performance Pentium® II & III proc 10 14X Pentium® proc i486 1 1.0m 0.7m 0.5m 0.35m 0.25m 0.18m • Frequency increased 61X • 18.3X process technology • Additional 3.3X uArch • Performance increased 100X • 14X process technology • Additional 7X uArch, design