1 / 21

CSCE 432/832 High Performance Processor Architectures Scalar to Superscalar

CSCE 432/832 High Performance Processor Architectures Scalar to Superscalar. Adopted from Lecture notes based in part on slides created by Mikko H. Lipasti, John Shen, Mark Hill, David Wood, Guri Sohi, and Jim Smith. Scalar to Superscalar. Forecast Real pipelines IBM RISC Experience

Download Presentation

CSCE 432/832 High Performance Processor Architectures Scalar to Superscalar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCE 432/832 High Performance Processor ArchitecturesScalar to Superscalar Adopted from Lecture notes based in part on slides created by Mikko H. Lipasti, John Shen, Mark Hill, David Wood, Guri Sohi, and Jim Smith

  2. Scalar to Superscalar • Forecast • Real pipelines • IBM RISC Experience • The case for superscalar • Instruction-level parallel machines • Superscalar pipeline organization • Superscalar pipeline design CSCE 432/832, Scalar to Superscalar

  3. Instructions Cycles Time = X X Instruction Program Cycle (code size) (CPI) (cycle time) Processor Performance • In the 1980’s (decade of pipelining): • CPI: 5.0 => 1.15 • In the 1990’s (decade of superscalar): • CPI: 1.15 => 0.5 (best case) Time Processor Performance = --------------- Program CSCE 432/832, Scalar to Superscalar

  4. Amdahl’s Law • h = fraction of time in serial code • f = fraction that is vectorizable • v = speedup for f • Overall speedup: N No. of Processors h 1 - h f 1 - f 1 Time CSCE 432/832, Scalar to Superscalar

  5. Revisit Amdahl’s Law • Sequential bottleneck • Even if v is infinite • Performance limited by nonvectorizable portion (1-f) N No. of Processors h 1 - h f 1 - f 1 Time CSCE 432/832, Scalar to Superscalar

  6. Pipelined Performance Model • g = fraction of time pipeline is filled • 1-g = fraction of time pipeline is not filled (stalled) CSCE 432/832, Scalar to Superscalar

  7. Pipelined Performance Model N Pipeline Depth 1 g 1-g • g = fraction of time pipeline is filled • 1-g = fraction of time pipeline is not filled (stalled) CSCE 432/832, Scalar to Superscalar

  8. Pipelined Performance Model • Tyranny of Amdahl’s Law [Bob Colwell] • When g is even slightly below 100%, a big performance hit will result • Stalled cycles are the key adversary and must be minimized as much as possible N Pipeline Depth 1 g 1-g CSCE 432/832, Scalar to Superscalar

  9. Motivation for Superscalar[Agerwala and Cocke] Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar) Typical Range CSCE 432/832, Scalar to Superscalar

  10. Superscalar Proposal • Moderate tyranny of Amdahl’s Law • Ease sequential bottleneck • More generally applicable • Robust (less sensitive to f) • Revised Amdahl’s Law: CSCE 432/832, Scalar to Superscalar

  11. Limits on Instruction Level Parallelism (ILP) CSCE 432/832, Scalar to Superscalar

  12. Superscalar Proposal • Go beyond single instruction pipeline, achieve IPC > 1 • Dispatch multiple instructions per cycle • Provide more generally applicable form of concurrency (not just vectors) • Geared for sequential code that is hard to parallelize otherwise • Exploit fine-grained or instruction-level parallelism (ILP) CSCE 432/832, Scalar to Superscalar

  13. Classifying ILP Machines [Jouppi, DECWRL 1991] • Baseline scalar RISC • Issue parallelism = IP = 1 • Operation latency = OP = 1 • Peak IPC = 1 CSCE 432/832, Scalar to Superscalar

  14. Classifying ILP Machines [Jouppi, DECWRL 1991] • Superpipelined: cycle time = 1/m of baseline • Issue parallelism = IP = 1 inst / minor cycle • Operation latency = OP = m minor cycles • Peak IPC = m instr / major cycle (m x speedup?) CSCE 432/832, Scalar to Superscalar

  15. Classifying ILP Machines [Jouppi, DECWRL 1991] • Superscalar: • Issue parallelism = IP = n inst / cycle • Operation latency = OP = 1 cycle • Peak IPC = n instr / cycle (n x speedup?) CSCE 432/832, Scalar to Superscalar

  16. Classifying ILP Machines [Jouppi, DECWRL 1991] • VLIW: Very Long Instruction Word • Issue parallelism = IP = n inst / cycle • Operation latency = OP = 1 cycle • Peak IPC = n instr / cycle = 1 VLIW / cycle CSCE 432/832, Scalar to Superscalar

  17. Classifying ILP Machines [Jouppi, DECWRL 1991] • Superpipelined-Superscalar • Issue parallelism = IP = n inst / minor cycle • Operation latency = OP = m minor cycles • Peak IPC = n x m instr / major cycle CSCE 432/832, Scalar to Superscalar

  18. Superscalar vs. Superpipelined • Roughly equivalent performance • If n = m then both have about the same IPC • Parallelism exposed in space vs. time CSCE 432/832, Scalar to Superscalar

  19. Superpipelining “Superpipelining is a new and special term meaning pipelining. The prefix is attached to increase the probability of funding for research proposals. There is no theoretical basis distinguishing superpipelining from pipelining. Etymology of the term is probably similar to the derivation of the now-common terms, methodology and functionality as pompous substitutes for method and function. The novelty of the term superpipelining lies in its reliance on a prefix rather than a suffix for the pompous extension of the root word.” - Nick Tredennick, 1991 CSCE 432/832, Scalar to Superscalar

  20. Superpipelining: Hype vs. Reality CSCE 432/832, Scalar to Superscalar

  21. Superscalar Challenges CSCE 432/832, Scalar to Superscalar

More Related