420 likes | 588 Views
Modern and Future Processors. Dr. Gheith Abandah. History. 1946: ENIAC First Electronic Computer 100 feet long 18,000 vacuum tubes Addition takes 0.2 ms (5 KHz). Semiconductor Age. 1947: First transistor 1950: BJT 1953: Hearing Aid Late 1950 ’ s: Integrated circuits. Moore ’ s Law.
E N D
Modern and Future Processors Dr. Gheith Abandah
History • 1946: ENIAC • First Electronic Computer • 100 feet long • 18,000 vacuum tubes • Addition takes 0.2 ms (5 KHz)
Semiconductor Age • 1947: First transistor • 1950: BJT • 1953: Hearing Aid • Late 1950’s: Integrated circuits
Moore’s Law • 1965: Intel’s Gordon Moore predicts that the number of transistors on a chip will double roughly every year (a decade later, revised to every 2 years).
First Microprocessor • 1971: Intel 4004 • 4-bit processor • 1/8” by 1/16” • 2,300 transistors • 10micron PMOS technology • 108 KHz
Outline • Driving Forces • Technology • Architectural Innovation • Reasons for the Slowdown • Multi-Core Trend • Future
Performance Equation Performance = Frequency * IPC IPC = Instructions Per Cycle Processor Clock
Technology • The IC manufacturing technology gives smaller transistors every year. • 1971: 10 micron • 2007: 0.045 micron • → More transistors on one chip
Technology • Smaller transistors → • Cheaper • Switch faster • Gate delays decrease. • Processors run on higher frequency.
Architectural Innovation Performance = Frequency * IPC • Architects use the extra transistors to • execute more instructions per cycle • or increase the frequency.
Architectural Innovation • Super Pipelining • Superscalar • Dynamic Execution • Multi-Threading
Executing Instructions loop: load t1, 0(s1) load t2, 100(s1) add t3, t1, t2 store t3, 200(s1) add s1, s1, 1 bne s1, s0, loop Data Results Processor Instructions Memory
Conventional Processor Instruction 1 Fetch Decode Execute Write Instruction 2 Fetch Decode Execute Write Frequency = f, IPC = 1/4
F F F D D D E E E W W W Pipelined Processor Instruction 1 Instruction 2 Instruction 3 Frequency = f, IPC = 1
Super-Pipelined Processor Instruction 1 Instruction 2 Instruction 3 Frequency = 5f,IPC = 1
F F F F F F D D D D D D E E E E E E W W W W W W Superscalar Processor Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Frequency = f, IPC = 2
IBM POWER5 • Released in 2003: 276 million transistors
Dynamic Execution I1 • Consider large instruction window • Execute the ready instruction • One stalled instruction doesn’t necessarily stalls the entire processor I1 I1 I1 I1 ALU 1 ALU 2 ALU 3
Multi-Threading • The processor executes instructions from multiple threads.
Outline • Driving Forces • Technology • Architectural Innovation • Reasons for the Slowdown • Multi-core Trend • Future
Reasons for the Slowdown • Power = Frequency × Voltage 2× Capacitance
Reasons for the Slowdown 2. Processor complexity increases with: • More instructions issued per cycle • Longer pipeline 3. In most applications, there isn’t enough instruction level parallelism (ILP) to fill many pipelines → diminishing returns. 4. Many applications aren’t threaded.
Reasons for the Slowdown We could build a slightly faster chip, but it would cost twice the die area while gaining only a 20 percent speed increase - Marc Tremblay, Sun Microsystems
Outline • Driving Forces • Technology • Architectural Innovation • Reasons for the Slowdown • Multi-Core Trend • Future
Multi-Core Trend • Modern processor chips contain processing cores and levels of memory caches. • manufacturers are building chips with multiple cooler-running, more energy-efficient processing cores instead of one increasingly powerful core.
IBM POWER5 Chip 276 million transistors
IBM Cell Chip 234 million transistors
Outline • Driving Forces • Technology • Architectural Innovation • Reasons for the Slowdown • Multi-core Trend • Future
Future • Research on better semiconductor materials to build smaller, faster, and cooler transistors. • Fine power management. • Keep tweaking the cores for more performance optimizations. • Multiple cores are here to stay. • More and larger caches. • Compilers that generate parallel threads automatically.