Superscalar - summary

Superscalar - summary • Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store • Requires complex IFU • Able to issue multiple instructions/cycle (typ 4) • Able to detect hazards (unavailability of operands) • Able to re-order instruction issue • Aim to keep all the FUs busy • Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3

Computer Architecture Speculation & Branching Iolanthe II approaches Rangitoto

Speculation • High Tech Gambling? • Data Prefetch • Cache instruction • dcbt : data cache block touch • Attempts to bring data into cache • so that it will be “close” when needed • Allows SIU to use idle bus bandwidth • if there’s no spare bandwidth, this read can be given low priority • Speculative because • a branch may occur before it’s used • we speculate that this data may be needed PowerPC mnemonic - Similar opcodes found in other architectures:SPARC v9, MIPS, …

Speculation - General • Some functional units almost always idle • Make them do some (possibly useful) workrather than idle • If the speculation was incorrect, results are simply abandoned • No loss in efficiency; Chance of a gain • Researchers are actively looking at software prefetch schemes • Fetch data well before it’s needed • Reduce latency when it’s actually needed • Speculative operations have low priority and use idle resources

Branching • Expensive • 2-3 cycles lost in pipeline • All instructions following branch ‘flushed’ • Bandwidth wasted fetching unused instructions • Stall while branch target is fetched • We can speculate about the target of a branch • Terminology • Branch Target : address to which branch jumps • Branch Taken : control transfers to non- sequential address (target) • Branch Not Taken : next instruction is executed

Branching - Prediction • Branches can be • unconditional: branch is always takencall subroutine return from subroutine • conditional: branch depends on state of computation, eg has loop terminated yet? • Unconditional branches are simple • New instructions are fetched as soon as the branch is recognized • As early in the pipeline as possible • Branch units often placed with fetch & decode stages

Branching - Branch Unit • PowerPC 603 logical layout

Branching - Speculation • We have the following code: • if ( cond ) s1; else s2; • Superscalar machine • Multiple functional units • Start executing both branches (s1 and s2) • Keep idle functional units busy! • One is speculative and will be abandoned • Processor will eventually calculate the branch condition and select which result should be retained (written back) • MIPS R10000 - up to 4 speculative at once

Branching - Speculation • MIPS R10000 - • Up to 4 speculative at once • Instructions are “tagged” with a 4 bit mask • Indicates to which branch instruction it belongs • As soon as condition is determined,mis-predicted instructions are aborted

Branching - Prediction • We have a sequence of instructions: • addlw • sub • brne L1 • orst • If you were asked to guess which branch should be preferred, • which would you choose: • Next sequential instruction (L2) • Branch target (L1) L1 Some mixture of arithmetic, load, store, etc, instructions branch on some condition Some more arithmetic, load, store, etc, instructions L2

Branching - Prediction • Studies show that branches are taken most of the time! • Because of loops: • add ;any mix of arith,lw ;load, store, etc, • sub ;instructionsbrne L1 ;branch back to loop start • or ;some more arith,st ;memory, etc instructions L1 L2

Branching - Prediction Rule • A simple prediction rule: • Take backward branches • works amazingly well! • For a loop with n iterations,this is wrong in 1/n cases only! • A system working on this rule alone would • detect the backward branch and • start fetching from the branch targetrather than the next instruction

Branching - Improving the prediction • Static prediction systems • Compiler can mark branches • Likely to be taken or not • Instruction fetch unit will use the marking as advice on which instruction to fetch • Compiler often able to give the right advice • Loops are easily detected • Other patterns in conditions can be recognized • Checking for EOF when reading a file • Error checking

Branching - Improving the prediction • Dynamic prediction systems • Program history determines most likely branch • Branch Target Buffers - Another cache!

Branching - Branch Target Buffer • Instruction Add[11:3] selects BTB entry • Tag determines “hit” • Stats select taken/not taken R10000 87% prediction accuracy - SPEC’92 integer

Superscalar - summary

Superscalar - summary

Presentation Transcript

Superscalar Microprocessors

Superscalar Processors

Superscalar and VLIW Architectures

SUPERSCALAR ARCHITECTURE

Lecture 5: Interrupts, Superscalar

Superscalar Implementation

SuperScalar Design Prime

Superscalar Processor Design Superscalar Architecture

Superscalar Processors

CSL718 : Superscalar Processors

Superscalar Processor

Superscalar Processors

Programming with GRID superscalar

Superscalar Pipeline Architectures

COMP Superscalar Tutorial

COMP Superscalar: Bringing GRID superscalar and GCM together

Problems with Superscalar approach

Chapter 4 Superscalar Organization

Superscalar Processors

PIPELINNING TO SUPERSCALAR

Superscalar Processors