390 likes | 618 Views
CSL718 : Pipelined Processors. Improving Branch Performance – contd. 21st Jan, 2006. Improving Branch Performance. Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction
E N D
CSL718 : Pipelined Processors Improving Branch Performance – contd. 21st Jan, 2006 Anshul Kumar, CSE IITD
Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD
Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD
Branch Elimination Use conditional/guarded instructions (predicated execution) F C T S C : S OP1 BC CC = Z, + 2 ADD R3, R2, R1 OP2 OP1 ADD R3, R2, R1, NZ OP2 Examples: HP PA (all integer arithmetic/logical instructions) DEC Alpha, SPARC V9 (conditional move) Anshul Kumar, CSE IITD
Branch Elimination - contd. CC IF IF IF D AG DF DF DF EX EX OP1 IF IF IF D AG TIF TIF TIF BC IF IF IF D’ D AG ADD/OP2 IF IF IF D AG DF DF DF EX EX ADD (cond) Anshul Kumar, CSE IITD
Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD
Branch Speed Up : early target address generation • Assume each instruction is Branch • Generate target address while decoding • If target in same page omit translation • After decoding discard target address if not Branch IF IF IF D TIF TIF TIF AG BC Anshul Kumar, CSE IITD
Branch Speed Up : increase CC - branch gap Increase the gap between the instruction which sets CC and branching • Early CC setting • Delayed branch Anshul Kumar, CSE IITD
Summary - Branch Speed Up n=0 n=1 n=2 n=3 n=4 n=5 uncond 4 4 4 4 4 4 cond (T) 6 5 4 4 4 4 cond (I) 5 4 3 2 1 0 uncond 4 3 2 1 0 0 cond (T) 6 5 4 3 2 1 cond (I) 5 4 3 2 1 0 delayedearly CC branchsetting Anshul Kumar, CSE IITD
Delayed Branch with Nullification (Also called annulment ) • Delay slot is used optionally • Branch instruction specifies the option • Option may be exercised based on correctness of branch prediction • Helps in better utilization of delay slots Anshul Kumar, CSE IITD
Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD
Branch Prediction • Treat conditional branches as unconditional branches / NOP • Undo if necessary Strategies: • Fixed (always guess inline) • Static (guess on the basis of instruction type / displacement) • Dynamic (guess based on recent history) Anshul Kumar, CSE IITD
Static Branch Prediction Total 68.2% Anshul Kumar, CSE IITD
Threshold forStatic prediction actual T I guessT4 5 I6 0 CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .71 Anshul Kumar, CSE IITD
Dynamic Branch Prediction -basic idea Predict based on the history of previous branch loop: xxx2 mispredictions xxxfor every xxxoccurrence xxx BC loop Anshul Kumar, CSE IITD
Dynamic Branch Prediction -2 bit prediction scheme N 0 1 T 3/2 0/1 T N T predict not taken predict taken N N 2 3 T Anshul Kumar, CSE IITD
Dynamic Branch Prediction -second scheme Predict based on the history of previous n branches e.g., if n = 3 then 3 branches taken predict taken 2 branches taken predict taken 1 branch taken predict not taken 0 branches taken predict not taken Anshul Kumar, CSE IITD
Dynamic Branch Prediction -Bimodal predictor Maintain saturating counters T T T T 0 1 2 3 N N N N One counter per branch or One counter per cache line - merge results if multiple branches Anshul Kumar, CSE IITD
Dynamic Branch Prediction -History of last n occurrences current entry updated entry outcome of last three occurrences of this branch 0 : not taken 1 : taken actual outcome ‘taken’ 1 1 0 1 1 1 prediction using majority decision Anshul Kumar, CSE IITD
Dynamic Branch Prediction -storing prediction counters store in separate buffer or store in cache directory CACHE directory storage cache line counter Anshul Kumar, CSE IITD
Correct guesses vs. history length Anshul Kumar, CSE IITD
Two-Level Prediction • Uses two levels of information to make a direction prediction • Branch History Table (BHT) - last n occurrences • Pattern History Table (PHT) - saturating 2 bit counters • Captures patterned behavior of branches • Groups of branches are correlated • Particular branches have particular behavior Anshul Kumar, CSE IITD
B1: if (x) ... B2: if (y) ... z = x && y B3: if (z) ... B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2 Correlation between branches Anshul Kumar, CSE IITD
Some Two-level Predictors PC BHT GBHR PHT PHT 1 0 1 1 0 1 1 0 1 0 T/NT T/NT 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 Local Predictor Global Predictor bits from PC and BHT can be combined to index PHT Anshul Kumar, CSE IITD
Two-level Predictor Classification • Yeh and Patt 3-letter naming scheme • Type of history collected • G (global), P (per branch), S (per set) • PHT type • A (adaptive), S (static) • PHT organization • g (global), p (per branch), s (per set) • Examples - GAs, PAp etc. Anshul Kumar, CSE IITD
Improving Branch Performance • Branch Elimination • replace branch with other instructions • Branch Speed Up • reduce time for computing CC and TIF • Branch Prediction • guess the outcome and proceed, undo if necessary • Branch Target Capture • make use of history Anshul Kumar, CSE IITD
Branch Target Capture • Branch Target Buffer (BTB) • Target Instruction Buffer (TIB) instr addr pred stats target target addr target instr prob of target change < 5% Anshul Kumar, CSE IITD
BTB Performance BTB miss go inline BTB hit go to target decision .4 .6 result inline target inline target .8 .2 .2 .8 delay 0 5 4 0 .4*.8*0 + .4*.2*5 + .6*.2*4 + .6*.8*0 = 0.88 Anshul Kumar, CSE IITD
Previous branch decisions Explicit prediction Stored in cache directory Branch History Table, BHT Previous target address / instruction Implicit prediction Stored in separate buffer Branch Target Buffer, BTB Br Target Addr Cache, BTAC Target Instr Buffer, TIB Br Target Instr Cache, BTIC Dynamic information about branch These two can be combined Anshul Kumar, CSE IITD
instr addr pred stats target Storing prediction info directory storage In cache cache line counter In separate buffer Anshul Kumar, CSE IITD
Combined prediction mechanism • Explicit : use history bits • Implicit : use BTB hit/miss • hit go to target, miss go inline • Combined : BTB hit/miss followed by explicit prediction using history bits. One of the following is commonly used • hit go to target, miss explicit prediction • miss go inline, hit explicit prediction Anshul Kumar, CSE IITD
Combined prediction BTB miss BTB hit T BTB miss I BTB hit expl predict expl predict I T I T I T I T I T I T I T I T Prediction T: Target, I: Inline Actual outcome T: Target, I: Inline Anshul Kumar, CSE IITD
Structure of Tables Instruction fetch path with • BHT • BTAC • BTIC Anshul Kumar, CSE IITD
Compute/fetch scheme (no dynamic branch prediction) A I I + 1 I + 2 I + 3 Instruction Fetch address I F AR I - cache BTA IIFA Compute BTA + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD
BHT (Branch History Table) Instruction Fetch address 2 2 2 2 I-cache 16 K 4-way set assoc BHT 128 x 4 lines 8 instr/line 128 x 4 entries 2 2 2 2 4 instr/cycle History bits 4 x 1 instr Prediction logic decode queue issue queue 4 x 1 instr Taken / not taken BTA for a taken guess Anshul Kumar, CSE IITD
BTAC scheme A I I + 1 I + 2 I + 3 Instruction Fetch address BA BTA I F AR I - cache BTA IIFA BTAC + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD
BTIC scheme - 1 A I Instruction Fetch address BA BTI BTA+ I F AR I - cache BTA IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD
BTIC scheme - 2 computed A I I+1 Instruction Fetch address BA BTI BTI+1 I F AR I - cache BTA+ IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD
Successor index in I-cache successor index A I I + 1 I + 2 I + 3 Instruction Fetch address I F AR IIFA I - cache Next address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD