Thousand Core Chips A Technology Perspective

Thousand Core ChipsA Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007

Outline • Technology outlook • Evolution of Multi—thousands of cores? • How do you feed thousands of cores • Future challenges: variations and reliability • Resiliency • Summary

Technology Outlook

Terascale Integration Capacity Total Transistors, 300mm2 die ~100MB Cache ~1.5B Logic Transistors 100+B Transistor integration capacity

300mm2 Die Scaling Projections Freq scaling will slow down Vdd scaling will slow down Power will be too high

Why Multi-core? –Performance Ever increasing single cores yield diminishing performance in a power envelope Multi-cores provide potential for near-linear performance speedup

Cache Cache Core Core Core Why Dual-core? –Power Rule of thumb In the same process technology… Voltage = 1 Freq = 1 Area = 1 Power = 1 Perf = 1 Voltage = -15% Freq = -15% Area = 2 Power = 1 Perf = ~1.8

Cache Large Core Small Core C1 C2 Cache C3 C4 From Dual to Multi— Power Power = 1/4 4 Performance Performance = 1/2 3 2 2 1 1 1 1 4 4 Multi-Core: Power efficient Better power and thermal management 3 3 2 2 1 1

General Purpose Cores GP GP GP C GP C C C GP SP GP C SP C C C Special Purpose HW C C C C SP GP GP SP Interconnect fabric C C C C GP GP GP GP Future Multi-core Platform Heterogeneous Multi-Core Platform—SOC

Vdd 0.7xVdd Cores with critical tasks Freq = f, at Vdd TPT = 1, Power = 1 f f/2 0 f Non-critical cores Freq = f/2, at 0.7xVdd TPT = 0.5, Power = 0.25 f/2 0 f f/2 0 f f/2 0 f f/2 0 f Cores shut down TPT = 0, Power = 0 Fine Grain Power Management

Performance Scaling Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N) Serial% = 6.7% N = 16, N1/2 = 8 16 Cores, Perf = 8 Serial% = 20% N = 6, N1/2 = 3 6 Cores, Perf = 3 Parallel software key to Multi-core success

144 Cores 12 Cores 48 Cores From Multi to Many… 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm

288 Cores 24 Cores 96 Cores From Many to Too Many… 13mm, 100W, 96MB Cache, 8B Transistors, in 16nm

On Die Network Power 300mm2 Die • A careful balance of: • Throughput performance • Single thread performance (core size) • Core and network power

Observations • Scaling Multi— demands more parallelism every generation • Thread level, task level, application level • Many (or too many) cores does not always mean… • The highest performance • The highest MIPS/Watt • The lowest power • If on-die network power is significant, then power is even worse Now software, too, must follow Moore’s Law

Memory BW Gap Busses have become wider to deliver necessary memory BW (10 to 30 GB/sec) Yet, memory BW is not enough Many Core System will demand 100 GB/sec memory BW How do you feed the beast?

IO Pins and Power State of the art: 100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec  25mw/Gb/sec = 25 Watts Bus-width = 1,000/5 = 200, about 400 pins (differential) Too many signal pins, too much power

High speed busses Busses are transmission lines L-R-C effects Need signal termination Signal processing consumes power > 5mm Chip Chip Bus <2mm Solutions: Reduce distance to << 5mm R-C bus Reduce signaling speed (~1Gb/sec) Increase pins to deliver BW 1-2 mw/Gbps Chip Chip Solution 100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec  2mw/Gb/sec = 2 Watts Bus-width = 1,000/1 = 1,000 pins

Heat-sink Heat Si Chip Power Signals Package Anatomy of a Silicon Chip

Si Chip Si Chip Package System in a Package Limited pins: 10mm / 50 micron = 200 pins Limited pins Signal distance is large ~10 mm – higher power Complex package

Heat-sink Temp = 85°C High temp, hot spots Not good for DRAM Junction Temp = 100+°C CPU DRAM Package DRAM on Top

Heat-sink DRAM CPU Package DRAM at the Bottom Power and IO signals go through DRAM to CPU Thin DRAM die Through DRAM vias The most promising solution to feed the beast

Wider Extreme device variations Soft Error FIT/Chip (Logic & Mem) Burn-in may phase out…? Time dependent device degradation Reliability

Implications to Reliability • Extreme variations (Static & Dynamic) will result in unreliable components • Impossible to design reliable system as we know today • Transient errors (Soft Errors) • Gradual errors (Variations) • Time dependent (Degradation) Reliable systems with unreliable components —Resilient mArchitectures

Implications to Test • One-time-factory testing will be out • Burn-in to catch chip infant-mortality will not be practical • Test HW will be part of the design • Dynamically self-test, detect errors, reconfigure, & adapt

100 Billion Transistors 100 BT integration capacity Billions unusable (variations) Some will fail over time Intermittent failures In a Nut-shell… Yet, deliver high performance in the power & cost envelope

C C C C C C C C C C C C C C C C Resiliency with Many-Core • Dynamic on-chip testing • Performance profiling • Cores in reserve (spares) • Binning strategy • Dynamic, fine grain, performance and power management • Coarse-grain redundancy checking • Dynamic error detection & reconfiguration • Decommission aging cores, swap with spares • Dynamically… • Self test & detect • Isolate errors • Confine • Reconfigure, and • Adapt

Summary • Moore’s Law with Terascale integration capacity will allow integration of thousands of cores • Power continues to be the challenge • On-die network power could be significant • Optimize for power with the size of the core and the number of cores • 3D Memory technology needed to feed the beast • Many-cores will deliver the highest performance in the power envelope with resiliency

Thousand Core Chips A Technology Perspective

Thousand Core Chips A Technology Perspective

Presentation Transcript

A Thousand Suns Download

A Technology Developers Perspective March 2008

A Thousand Splendid Suns

Chips with everything “chips glorious chips”

A RHETORICAL PERSPECTIVE ON TECHNOLOGY

NA-MIC Highlights: A Core 1 Perspective

Assistive technology: a modified perspective

A Thousand Splendid Suns

A Thousand Splendid Suns

A Thousand Splendid Suns

A Thousand Splendid Suns

A Thousand Splendid Suns

A Thousand Splendid Suns

Defining High-Technology: A Design Perspective

A Thousand Splendid Suns

A Thousand Splendid Suns

A Thousand Splendid Suns

Technology perspective

A Thousand Acres:

Cypak core technology

A Thousand Splendid Suns

A Thousand Years…