1 / 35

CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction

CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction . Sandeep K. S. Gupta School of Computing and Informatics Arizona State University. Based on Slides by David Patterson, Al Davis, and Luddy Harrison. Agenda. Dynamic Branch Prediction 1-Bit Predictor 2-Bit Predictor

arlene
Download Presentation

CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction Sandeep K. S. Gupta School of Computing and Informatics Arizona State University Based on Slides by David Patterson, Al Davis, and Luddy Harrison

  2. Agenda • Dynamic Branch Prediction • 1-Bit Predictor • 2-Bit Predictor • Correlating Predictor • Tournament Predictor • Programming Assignment 1: Case Study 2 on pg 149 – Modeling a Branch Predictor in C or JAVA. CSE420/598

  3. Integer Floating Point Need for Better than Static Branch Prediction Techniques CSE420/598

  4. Dynamic Branch Prediction • Why does prediction work? • Underlying algorithm has regularities • Data that is being operated on has regularities • Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems • Is dynamic branch prediction better than static branch prediction? • Seems to be • There are a small number of important branches in programs which have dynamic behavior CSE420/598

  5. Control Hazard (Recap) • In the 5-stage in-order processor: assume always taken or assume always not taken; if the branch goes the other way, squash mis-fetched instructions • Modern out-of-order processors: dynamic branch prediction • Branch predictor: a cache of recent branch outcomes CSE420/598

  6. Pipeline without Branch Predictor PC IF (br) Reg Read Compare Br-target PC + 4 In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch CSE420/598

  7. Pipeline with Branch Predictor PC IF (br) Reg Read Compare Br-target Branch Predictor In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch CSE420/598

  8. Branch Mispredict Penalty • Performance = ƒ(accuracy, cost of misprediction) • Assume: no data or structural hazards; only control hazards; every 5th instruction is a branch; branch predictor accuracy is 90% • Slowdown = 1 / (1 + stalls per instruction) • Stalls per instruction = % branches x %mispreds x penalty = 20% x 10% x 1 = 0.02 • Slowdown = 1/1.02 ; if penalty = 20, slowdown = 1/1.4 CSE420/598

  9. Dynamic Branch Prediction – 1 Bit Prediction • Branch History Table (BHT): Lower bits of PC address index table of 1-bit values • Says whether or not branch taken last time • No address check • For each branch, keep track of what happened last time and use that outcome as the prediction CSE420/598

  10. 1-bit BHT a.k.a Branch Prediction Buffer (BPB) Predict:If BPB entry is 0, fetch PC+1If BPB entry is 1, fetch L Update:If branch is taken, BPB := 1If branch is not taken, BPB := 0 CSE420/598

  11. State Diagram of 1-bit Predictor CSE420/598

  12. Twice Mispredicted Loop Branches M: ADD R1, R2, R3 L: ADD R4, R5, R6 MUL R7, R8, R9 SUB R11, R11, #1BNE L SUB R10, R10, #1 BNE M CSE420/598

  13. Sequence of Predictions CSE420/598

  14. Problem with 1-bit BHT • What are prediction accuracies for branches 1 and 2 ? while (1) { for (i=0;i<10;i++) { branch-1 … } for (j=0;j<20;j++) { branch-2 … }} • Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit): • End of loop case, when it exits instead of looping as before • First time through loop and on next time through code, when it predicts exit instead of looping CSE420/598

  15. 2-Bit Prediction • For each branch, maintain a 2-bit saturating counter: • if the branch is taken: counter = min(3,counter+1) • if the branch is not taken: counter = max(0,counter-1) • If (counter >= 2), predict taken, else predict not taken • Advantage: a few atypical branches will not influence the prediction (a better measure of “the common case”) • Especially useful when multiple branches share the same counter (some bits of the branch PC are used to index into the branch predictor) • Can be easily extended to N-bits (in most processors, N=2) CSE420/598

  16. T Predict Taken Predict Taken T NT NT NT Predict Not Taken Predict Not Taken T T NT Dynamic Branch Prediction • Solution: 2-bit scheme where change prediction only if get misprediction twice in a row • Red: stop, not taken • Green: go, taken • Adds hysteresis to decision making process CSE420/598

  17. Bimodal Predictor Table of 16K entries of 2-bit saturating counters 14 bits Branch PC CSE420/598

  18. BHT Accuracy • Mispredict because either: • Wrong guess for that branch • Got branch history of wrong branch when index the table • 4096 entry table: Integer CSE420/598 Floating Point

  19. Correlating Predictors • Basic branch prediction: maintain a 2-bit saturating counter for each entry (or use 10 branch PC bits to index into one of 1024 counters) – captures the recent “common case” for each branch • Can we take advantage of additional information? • If a branch recently went 01111, expect 0; if it recently went 11101, expect 1; can we have a separate counter for each case? • If the previous branches went 01, expect 0; if the previous branches went 11, expect 1; can we have a separate counter for each case? • Hence, build correlating predictors CSE420/598

  20. Local/Global Predictors • Instead of maintaining a counter for each branch to capture the common case, • Maintain a counter for each branch and surrounding pattern • If the surrounding pattern belongs to the branch being predicted, the predictor is referred to as a local predictor • If the surrounding pattern includes neighboring branches, the predictor is referred to as a global predictor CSE420/598

  21. Global Predictor A single register that keeps track of recent history for all branches Table of 16K entries of 2-bit saturating counters 00110101 8 bits 6 bits Branch PC Also referred to as a two-level predictor CSE420/598

  22. Local Predictor Also a two-level predictor that only uses local histories at the first level Branch PC Table of 16K entries of 2-bit saturating counters Use 6 bits of branch PC to index into local history table 10110111011001 14-bit history indexes into next level Table of 64 entries of 14-bit histories for a single branch CSE420/598

  23. Correlated Branch Prediction • Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table • In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters • Thus, old 2-bit BHT is a (0,2) predictor • Global Branch History: m-bit shift register keeping T/NT status of last m branches. • Each entry in table has mn-bit predictors. CSE420/598

  24. Correlating Branches • (2,2) predictor • – Behavior of recent branches selects between four predictions of next branch, updating just that prediction Branch address 4 2-bits per branch predictor Prediction 2-bit global branch history CSE420/598

  25. Accuracy of Different Schemes 20% 4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT 18% 16% 14% 12% 11% Frequency of Mispredictions 10% 8% 6% 6% 6% 6% 5% 5% 4% 4% 2% 1% 1% 0% 0% nasa7 matrix300 tomcatv doducd spice fpppp gcc expresso eqntott li 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) CSE420/598

  26. Tournament Predictors • A local predictor might work well for some branches or • programs, while a global predictor might work well for others • Provide one of each and maintain another predictor to • identify which predictor is best for each branch Local Predictor M U X Global Predictor Branch PC Tournament Predictor Table of 2-bit saturating counters CSE420/598

  27. Global Predictor – Example What is the total capacity of this branch predictor? A single register that keeps track of recent history for all branches Table of 2-bit saturating counters 00110101 10 bits 4 bits Branch PC Also referred to as a two-level predictor CSE420/598

  28. Local Predictor – Example What is the total capacity of this branch predictor? Branch PC Table of 2-bit saturating counters Use 8 bits of branch PC to index into local history table 10110111 Table of 8-bit histories for a single branch CSE420/598

  29. Example • Consider the following tournament branch predictor: Fourteen bits of • the PC are used to index into a table of 3-bit saturating counters that • predict whether we should use a local or global prediction. The global • predictor concatenates 8 bits of branch PC and 6 bits of global history • to index into 2-bit saturating counters. The local predictor uses 8 bits • of branch PC to select an 8-bit local history that then indexes into a • table of 2-bit saturating counters. What is the capacity of each • structure in this branch predictor? CSE420/598

  30. Tournament Predictors • Multilevel branch predictor • Use n-bit saturating counter to choose between predictors • Usual choice between global and local predictors CSE420/598

  31. Tournament Predictors Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between: • Global predictor • 4K entries index by history of last 12 branches (212 = 4K) • Each entry is a standard 2-bit predictor • Local predictor • Local history table: 1024 10-bit entries recording last 10 branches, index by branch address • The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters CSE420/598

  32. Comparing Predictors (Fig. 2.8) • Advantage of tournament predictor is ability to select the right predictor for a particular branch • Particularly crucial for integer benchmarks. • A typical tournament predictor will select the global predictor almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks CSE420/598

  33. Pentium 4 Misprediction Rate (per 1000 instructions, not per branch) 6% misprediction rate per branch SPECint (19% of INT instructions are branch) 2% misprediction rate per branch SPECfp(5% of FP instructions are branch) SPECint2000 SPECfp2000 CSE420/598

  34. Branch Target Prediction • In addition to predicting the branch direction, we must • also predict the branch target address • Branch PC indexes into a predictor table; indirect branches • might be problematic • Most common indirect branch: return from a procedure – • can be easily handled with a stack of return addresses CSE420/598

  35. Summary • When comparing Branch predictors – ensure that they are of same “size”. • Correlating predictor’s predict branch direction based on behavior of neighboring branches • Tournament predictors select between global and local predictors • Integer benchmarks benefit greatly from global and correlating predictors • Next class BTB, Dynamic Scheduling of Instructions. CSE420/598

More Related