1 / 24

COMPSYS 304

COMPSYS 304. Computer Architecture Speculation & Branching. Morning visitors - Paradise Bay, Bay of Islands. Speculation. High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed

nyoko
Download Presentation

COMPSYS 304

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands

  2. Speculation • High Tech Gambling? • Data Prefetch • Cache instruction • dcbt : data cache block touch • Attempts to bring data into cache • so that it will be “close” when needed • Allows SIU to use idle bus bandwidth • if there’s no spare bandwidth, this read can be given low priority • Speculative because • a branch may occur before it’s used • we speculate that this data may be needed PowerPC mnemonic - Similar opcodes found in other architectures:SPARC v9, MIPS, …

  3. Speculation - General • Some functional units almost always idle • Make them do some (possibly useful) workrather than idle • If the speculation was incorrect, results are simply abandoned • No loss in efficiency; Chance of a gain • Researchers are actively looking at software prefetch schemes • Fetch data well before it’s needed • Reduce latency when it’s actually needed • Speculative operations have low priority and use idle resources

  4. Branching • Expensive • 2-3 cycles lost in pipeline • All instructions following branch ‘flushed’ • Bandwidth wasted fetching unused instructions • Stall while branch target is fetched • We can speculate about the target of a branch • Terminology • Branch Target : address to which branch jumps • Branch Taken : control transfers to non- sequential address (target) • Branch Not Taken : next instruction is executed

  5. Branching - Prediction • Branches can be • unconditional: branch is always takencall subroutine return from subroutine • conditional: branch depends on state of computation, eg has loop terminated yet? • Unconditional branches are simple • New instructions are fetched as soon as the branch is recognized • As early in the pipeline as possible • Branch units often placed with fetch & decode stages

  6. Branching - Branch Unit • PowerPC 603 logical layout

  7. Branching - Speculation • We have the following code: • if ( cond ) s1; else s2; • Superscalar machine • Multiple functional units • Start executing both branches (s1 and s2) • Keep idle functional units busy! • One is speculative and will be abandoned • Processor will eventually calculate the branch condition and select which result should be retained (written back) • MIPS R10000 - up to 4 speculative at once

  8. Branching - Speculation • MIPS R10000 - • Up to 4 speculative at once • Instructions are “tagged” with a 4 bit mask • Indicates to which branch instruction it belongs • As soon as condition is determined,mis-predicted instructions are aborted

  9. Branching - Prediction • We have a sequence of instructions: • addlw • sub • brne L1 • orst • If you were asked to guess which branch should be preferred, • which would you choose: • Next sequential instruction (L2) • Branch target (L1) L1 Some mixture of arithmetic, load, store, etc, instructions branch on some condition Some more arithmetic, load, store, etc, instructions L2

  10. Branching - Prediction • Studies show that backward branches are taken most of the time! • Because of loops: • add ;any mix of arith,lw ;load, store, etc, • sub ;instructionsbrne L1 ;branch back to loop start • or ;some more arith,st ;memory, etc instructions L1 L2

  11. Branching - Prediction Rule • A simple prediction rule: • Take backward branches • works amazingly well! • For a loop with n iterations,this is wrong in 1/n cases only! • A system working on this rule alone would • detect the backward branch and • start fetching from the branch targetrather than the next instruction

  12. Branching - Improving the prediction • Static prediction systems • Compiler can mark branches • Likely to be taken or not • Instruction fetch unit will use the marking as advice on which instruction to fetch • Compiler often able to give the right advice • Loops are easily detected • Other patterns in conditions can be recognized • Checking for EOF when reading a file • Error checking

  13. Branching - Improving the prediction • Dynamic prediction systems • Program history determines most likely branch • Branch Target Buffers - Another cache!

  14. Branching - Branch Target Buffer • Instruction Add[11:3] selects BTB entry • Tag determines “hit” • Stats select taken/not taken Pentium 4 >91% prediction accuracy - 4K entry BHT (Branch History Table) G4e – 2K entries

  15. Branching - Branch Target Buffer • BTB – just another cache • Works on temporal locality principle • If this branch is taken (not taken) now, it’s likely to be taken (not taken) next time • Replace on conflicts (newest is best) • Any cache organization could be used • Direct mapped, associative, set-associative • No write-back needed • Flushed entries are restored • Major difference from other caches • Status bits …………

  16. Branching - Branch Target Buffer • Status bits • Provide hysteresis in behaviour • Without hysteresis, behaviour change would cause the prediction to immediately update • Example: • If ( cond ) s1else s2 • If the program takes branch s1 a few times,the BTB will predict that s1 is more likely than s2 • If s2 is then taken, usual cache behaviour suggests that the prediction should be updated to s2 but • Program branching behaviour is a little different ….

  17. Branching - Branch Target Buffer • Status bits • Common branch behaviour is like this • List of taken branches: s1 s1 s1 s1 s1 s2 s1 s1 s1 s2 s1 … • Usually s1 is executed,occasionally s2 eg • s2 handles errors • s2 follows a loop • ‘Standard’ cache update policies (assume the most recent will used next) would update the prediction from s1 to s2 immediately • This would cause many mis-predictions

  18. Branching - Branch Target Buffer • Status bits • However, if the BTB waits until it has seen s2 a number of times before changing the prediction, the previous stream is predicted well • So the status bits (say 2 bits) are a count of the number of correct predictions • A correct prediction updates the count (maybe saturating at 2 – ie counts to max 2) • A mis-prediction decrements the count • A mis-prediction and count=0 updates the prediction • This accommodates an occasional break from a pattern (eg s1 is usually taken) without disturbing the best prediction (take s1) • It also handles situations where behaviour changes sometimes

  19. Branching - Branch Target Buffer • Status bits - Count correct predictions • Handles situations where behaviour changes sometimes • Programs which move from one ‘region’ to another .. eg • Image processing code - looking for an orange object • Process background (non-orange) pixels, • finds the orange thing, • counts orange pixels for a while, then • reverts back to background // search for orange object in row of pixelsfor(j=0;j<width;j++) { if ( pixel[j].colour != orange ) // s1 bg_cnt++; else { // s2 o_cnt++; if ( o_cnt > obj_width ) … // found it! } }

  20. Branching - Branch Target Buffer • Status bits • Count correct predictions • Handles situations where behaviour changes sometimes • Programs which move from one ‘region’ to another .. • Example: • Image processing program looking for an orange object • Process background (non-orange) pixels, • finds the orange thing, • counts orange pixels for a while, then • reverts back to background • List of taken branches: Taken branches: s1 s1 s1 s2 s2 s2 … s2 s1 s1 s1 s1 Region: BG BG BG OR OR OR … OR BG BG BG BG Prediction: s1 s1 s1 s1 s1 s2 … s2 s2 s2 s1 s1 Correct:  … 

  21. Branching - Branch Target Buffer • Status bits • Count correct predictions • Reasonable compromise behaviour for most situations • Tolerates an occasional ‘error’ branch well • Changes to a new behaviour with a small delay • Typically about 90% correct predictions • BTB with 2k – 4k entries

  22. Speculation & Branching - Summary • Data speculation • Try to bring data ‘closer’ to CPU (ie into cache) before needed • Reduce memory access latency • Techniques • Special ‘touch’ instructions • Advice to processor – fetch if resources available • Software • eg Dummy reference • Instruction (Branch) speculation ..

  23. Speculation & Branching - Summary • Branches are expensive!! • Instruction (Branch) speculation • Execute both branches of a conditional branch • ‘Squash’ (abandon) results from wrong branchwhen branch condition eventually evaluated • Compiler can also mark most probable branch • Branch prediction • Simplest rule: take backward branches • Branch Target Buffer • Cache containing most recent branch target • ‘Standard’ cache, except for • Status bits • Introduce hysteresis into behaviour • Only update branch target when it’s definitely the right choice

  24. Superscalar - summary • Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store • Requires complex IFU • Able to issue multiple instructions/cycle (typ 4) • Able to detect hazards (unavailability of operands) • Able to re-order instruction issue • Aim to keep all the FUs busy • Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3

More Related