150 likes | 301 Views
COMPSYS 304. Computer Architecture Speculation & Branching. Morning visitors - Paradise Bay, Bay of Islands. Speculation. High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed
E N D
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands
Speculation • High Tech Gambling? • Data Prefetch • Cache instruction • dcbt : data cache block touch • Attempts to bring data into cache • so that it will be “close” when needed • Allows SIU to use idle bus bandwidth • if there’s no spare bandwidth, this read can be given low priority • Speculative because • a branch may occur before it’s used • we speculate that this data may be needed PowerPC mnemonic - Similar opcodes found in other architectures:SPARC v9, MIPS, …
Speculation - General • Some functional units almost always idle • Make them do some (possibly useful) workrather than idle • If the speculation was incorrect, results are simply abandoned • No loss in efficiency; Chance of a gain • Researchers are actively looking at software prefetch schemes • Fetch data well before it’s needed • Reduce latency when it’s actually needed • Speculative operations have low priority and use idle resources
Branching • Expensive • 2-3 cycles lost in pipeline • All instructions following branch ‘flushed’ • Bandwidth wasted fetching unused instructions • Stall while branch target is fetched • We can speculate about the target of a branch • Terminology • Branch Target : address to which branch jumps • Branch Taken : control transfers to non- sequential address (target) • Branch Not Taken : next instruction is executed
Branching - Prediction • Branches can be • unconditional: branch is always takencall subroutine return from subroutine • conditional: branch depends on state of computation, eg has loop terminated yet? • Unconditional branches are simple • New instructions are fetched as soon as the branch is recognized • As early in the pipeline as possible • Branch units often placed with fetch & decode stages
Branching - Branch Unit • PowerPC 603 logical layout
Branching - Speculation • We have the following code: • if ( cond ) s1; else s2; • Superscalar machine • Multiple functional units • Start executing both branches (s1 and s2) • Keep idle functional units busy! • One is speculative and will be abandoned • Processor will eventually calculate the branch condition and select which result should be retained (written back) • MIPS R10000 - up to 4 speculative at once
Branching - Speculation • MIPS R10000 - • Up to 4 speculative at once • Instructions are “tagged” with a 4 bit mask • Indicates to which branch instruction it belongs • As soon as condition is determined,mis-predicted instructions are aborted
Branching - Prediction • We have a sequence of instructions: • addlw • sub • brne L1 • orst • If you were asked to guess which branch should be preferred, • which would you choose: • Next sequential instruction (L2) • Branch target (L1) L1 Some mixture of arithmetic, load, store, etc, instructions branch on some condition Some more arithmetic, load, store, etc, instructions L2
Branching - Prediction • Studies show that backward branches are taken most of the time! • Because of loops: • add ;any mix of arith,lw ;load, store, etc, • sub ;instructionsbrne L1 ;branch back to loop start • or ;some more arith,st ;memory, etc instructions L1 L2
Branching - Prediction Rule • A simple prediction rule: • Take backward branches • works amazingly well! • For a loop with n iterations,this is wrong in 1/n cases only! • A system working on this rule alone would • detect the backward branch and • start fetching from the branch targetrather than the next instruction
Branching - Improving the prediction • Static prediction systems • Compiler can mark branches • Likely to be taken or not • Instruction fetch unit will use the marking as advice on which instruction to fetch • Compiler often able to give the right advice • Loops are easily detected • Other patterns in conditions can be recognized • Checking for EOF when reading a file • Error checking
Branching - Improving the prediction • Dynamic prediction systems • Program history determines most likely branch • Branch Target Buffers - Another cache!
Branching - Branch Target Buffer • Instruction Add[11:3] selects BTB entry • Tag determines “hit” • Stats select taken/not taken Pentium 4 >91% prediction accuracy - 4K entry BHT (Branch History Table) G4e – 2K entries
Superscalar - summary • Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store • Requires complex IFU • Able to issue multiple instructions/cycle (typ 4) • Able to detect hazards (unavailability of operands) • Able to re-order instruction issue • Aim to keep all the FUs busy • Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3