240 likes | 403 Views
COMPSYS 304. Computer Architecture Speculation & Branching. Morning visitors - Paradise Bay, Bay of Islands. Speculation. High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed
E N D
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands
Speculation • High Tech Gambling? • Data Prefetch • Cache instruction • dcbt : data cache block touch • Attempts to bring data into cache • so that it will be “close” when needed • Allows SIU to use idle bus bandwidth • if there’s no spare bandwidth, this read can be given low priority • Speculative because • a branch may occur before it’s used • we speculate that this data may be needed PowerPC mnemonic - Similar opcodes found in other architectures:SPARC v9, MIPS, …
Speculation - General • Some functional units almost always idle • Make them do some (possibly useful) workrather than idle • If the speculation was incorrect, results are simply abandoned • No loss in efficiency; Chance of a gain • Researchers are actively looking at software prefetch schemes • Fetch data well before it’s needed • Reduce latency when it’s actually needed • Speculative operations have low priority and use idle resources
Branching • Expensive • 2-3 cycles lost in pipeline • All instructions following branch ‘flushed’ • Bandwidth wasted fetching unused instructions • Stall while branch target is fetched • We can speculate about the target of a branch • Terminology • Branch Target : address to which branch jumps • Branch Taken : control transfers to non- sequential address (target) • Branch Not Taken : next instruction is executed
Branching - Prediction • Branches can be • unconditional: branch is always takencall subroutine return from subroutine • conditional: branch depends on state of computation, eg has loop terminated yet? • Unconditional branches are simple • New instructions are fetched as soon as the branch is recognized • As early in the pipeline as possible • Branch units often placed with fetch & decode stages
Branching - Branch Unit • PowerPC 603 logical layout
Branching - Speculation • We have the following code: • if ( cond ) s1; else s2; • Superscalar machine • Multiple functional units • Start executing both branches (s1 and s2) • Keep idle functional units busy! • One is speculative and will be abandoned • Processor will eventually calculate the branch condition and select which result should be retained (written back) • MIPS R10000 - up to 4 speculative at once
Branching - Speculation • MIPS R10000 - • Up to 4 speculative at once • Instructions are “tagged” with a 4 bit mask • Indicates to which branch instruction it belongs • As soon as condition is determined,mis-predicted instructions are aborted
Branching - Prediction • We have a sequence of instructions: • addlw • sub • brne L1 • orst • If you were asked to guess which branch should be preferred, • which would you choose: • Next sequential instruction (L2) • Branch target (L1) L1 Some mixture of arithmetic, load, store, etc, instructions branch on some condition Some more arithmetic, load, store, etc, instructions L2
Branching - Prediction • Studies show that backward branches are taken most of the time! • Because of loops: • add ;any mix of arith,lw ;load, store, etc, • sub ;instructionsbrne L1 ;branch back to loop start • or ;some more arith,st ;memory, etc instructions L1 L2
Branching - Prediction Rule • A simple prediction rule: • Take backward branches • works amazingly well! • For a loop with n iterations,this is wrong in 1/n cases only! • A system working on this rule alone would • detect the backward branch and • start fetching from the branch targetrather than the next instruction
Branching - Improving the prediction • Static prediction systems • Compiler can mark branches • Likely to be taken or not • Instruction fetch unit will use the marking as advice on which instruction to fetch • Compiler often able to give the right advice • Loops are easily detected • Other patterns in conditions can be recognized • Checking for EOF when reading a file • Error checking
Branching - Improving the prediction • Dynamic prediction systems • Program history determines most likely branch • Branch Target Buffers - Another cache!
Branching - Branch Target Buffer • Instruction Add[11:3] selects BTB entry • Tag determines “hit” • Stats select taken/not taken Pentium 4 >91% prediction accuracy - 4K entry BHT (Branch History Table) G4e – 2K entries
Branching - Branch Target Buffer • BTB – just another cache • Works on temporal locality principle • If this branch is taken (not taken) now, it’s likely to be taken (not taken) next time • Replace on conflicts (newest is best) • Any cache organization could be used • Direct mapped, associative, set-associative • No write-back needed • Flushed entries are restored • Major difference from other caches • Status bits …………
Branching - Branch Target Buffer • Status bits • Provide hysteresis in behaviour • Without hysteresis, behaviour change would cause the prediction to immediately update • Example: • If ( cond ) s1else s2 • If the program takes branch s1 a few times,the BTB will predict that s1 is more likely than s2 • If s2 is then taken, usual cache behaviour suggests that the prediction should be updated to s2 but • Program branching behaviour is a little different ….
Branching - Branch Target Buffer • Status bits • Common branch behaviour is like this • List of taken branches: s1 s1 s1 s1 s1 s2 s1 s1 s1 s2 s1 … • Usually s1 is executed,occasionally s2 eg • s2 handles errors • s2 follows a loop • ‘Standard’ cache update policies (assume the most recent will used next) would update the prediction from s1 to s2 immediately • This would cause many mis-predictions
Branching - Branch Target Buffer • Status bits • However, if the BTB waits until it has seen s2 a number of times before changing the prediction, the previous stream is predicted well • So the status bits (say 2 bits) are a count of the number of correct predictions • A correct prediction updates the count (maybe saturating at 2 – ie counts to max 2) • A mis-prediction decrements the count • A mis-prediction and count=0 updates the prediction • This accommodates an occasional break from a pattern (eg s1 is usually taken) without disturbing the best prediction (take s1) • It also handles situations where behaviour changes sometimes
Branching - Branch Target Buffer • Status bits - Count correct predictions • Handles situations where behaviour changes sometimes • Programs which move from one ‘region’ to another .. eg • Image processing code - looking for an orange object • Process background (non-orange) pixels, • finds the orange thing, • counts orange pixels for a while, then • reverts back to background // search for orange object in row of pixelsfor(j=0;j<width;j++) { if ( pixel[j].colour != orange ) // s1 bg_cnt++; else { // s2 o_cnt++; if ( o_cnt > obj_width ) … // found it! } }
Branching - Branch Target Buffer • Status bits • Count correct predictions • Handles situations where behaviour changes sometimes • Programs which move from one ‘region’ to another .. • Example: • Image processing program looking for an orange object • Process background (non-orange) pixels, • finds the orange thing, • counts orange pixels for a while, then • reverts back to background • List of taken branches: Taken branches: s1 s1 s1 s2 s2 s2 … s2 s1 s1 s1 s1 Region: BG BG BG OR OR OR … OR BG BG BG BG Prediction: s1 s1 s1 s1 s1 s2 … s2 s2 s2 s1 s1 Correct: …
Branching - Branch Target Buffer • Status bits • Count correct predictions • Reasonable compromise behaviour for most situations • Tolerates an occasional ‘error’ branch well • Changes to a new behaviour with a small delay • Typically about 90% correct predictions • BTB with 2k – 4k entries
Speculation & Branching - Summary • Data speculation • Try to bring data ‘closer’ to CPU (ie into cache) before needed • Reduce memory access latency • Techniques • Special ‘touch’ instructions • Advice to processor – fetch if resources available • Software • eg Dummy reference • Instruction (Branch) speculation ..
Speculation & Branching - Summary • Branches are expensive!! • Instruction (Branch) speculation • Execute both branches of a conditional branch • ‘Squash’ (abandon) results from wrong branchwhen branch condition eventually evaluated • Compiler can also mark most probable branch • Branch prediction • Simplest rule: take backward branches • Branch Target Buffer • Cache containing most recent branch target • ‘Standard’ cache, except for • Status bits • Introduce hysteresis into behaviour • Only update branch target when it’s definitely the right choice
Superscalar - summary • Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store • Requires complex IFU • Able to issue multiple instructions/cycle (typ 4) • Able to detect hazards (unavailability of operands) • Able to re-order instruction issue • Aim to keep all the FUs busy • Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3