1 / 15

Lecture 21: Instruction Level Parallelism (Branch Prediction)

Lecture 21: Instruction Level Parallelism (Branch Prediction). Computer Engineering 585 Fall 2001. PC. Associative Lookup expensive!. A0. 0. A1. 1. log k. A2. 1. BPB Index. PC. A(k-1). 0. Branch Prediction Buffer. IF. ID. EX. M. WB. I-Cache. PC. Branch Target Buffer (BTB).

maura
Download Presentation

Lecture 21: Instruction Level Parallelism (Branch Prediction)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 21: Instruction Level Parallelism (Branch Prediction) Computer Engineering 585 Fall 2001

  2. PC Associative Lookup expensive! A0 0 A1 1 log k A2 1 BPB Index PC A(k-1) 0 Branch Prediction Buffer IF ID EX M WB I-Cache PC

  3. Branch Target Buffer (BTB) PC of instruction to fetch Look up Predicted PC Number of entries in branch- target buffer No: instruction is = not predicted to be Branch branch. Proceed normally predicted taken or untaken Yes: then instruction is branch and predicted PC should be used as the next PC FIGURE 4.22 A branch-target buffer.

  4. Send PC to memory and branch-target buffer IF No Yes Entry found in branch-target buffer? Send out Is predicted No Yes PC instruction a taken branch? ID No Yes Taken branch? Normal instruction execution Enterbranch addr and next PC Into BTB Mispredictedbranch, kill fetched inst; restart fetch at other target; delete entry from BTB Branch predicted Correctly; continue execution with no stalls EX Branch Prediction Steps

  5. Dynamic Branch Prediction • Performance = ƒ(accuracy, cost of misprediction) • Branch History Table is simplest • Lower bits of PC address index table of 1-bit values • Says whether or not branch taken last time • No address check • Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit): • End of loop case, when it exits instead of looping as before • First time through loop on next time through code, when it predicts exit instead of looping

  6. 10 iterations Outer loop 1-Bit Prediction Drawbacks LOOP: Inst 1 Inst 2 Inst 3 . . Inst k Branch Taken: 9 times Not taken: 1 time 1-bit prediction mispredicts twice: 20% misprediction rate

  7. Dynamic Branch Prediction • Solution: 2-bit scheme where change prediction only if get misprediction twice: (Figure 4.13, p. 264) Taken Not taken Predict taken (11) Predict taken (10) Taken Taken Not taken Not taken Predict not taken (01) Predict not taken (00) Taken Not taken

  8. BHT Accuracy • Mispredict because either: • Wrong guess for that branch. • Got branch history of wrong branch when indexing the table. • 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12% • 4096 about as good as infinite table(in Alpha 211164)

  9. 4096 entry 2-bit Prediction accuracy nasa7 1% matrix300 0% 1% tomcatv doduc 5% SPEC89 spice 9% benchmarks fpppp 9% gcc 12% espresso 5% eqntott 18% 10% li 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% Frequency of mispredictions FIGURE 4.14 Prediction accuracy of a 4096-entry two-bit prediction buffer for t he SPEC89 benchmarks.

  10. 4096 entry Vs Infinite 2-bit prediction 1% nasa7 0% 0% matrix300 0% 1% tomcatv 0% 5% doduc 5% 9% spice 9% SPEC89 benchmarks 9% fpppp 9% 12% gcc 11% 5% espresso 5% 18% eqntott 18% 10% li 10% 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% Frequency of mispredictions 4096 entries: Unlimited entries: 2 bits per entry 2 bits per entry

  11. Correlating Branches • Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch. • Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper branch history table. • In general, (m,n) predictor means record last m branches to select between 2m history tables each with n-bit counters. • Old 2-bit BHT is then a (0,2) predictor

  12. BNEZ R1, L1 ADDI R1, R0, #1 L1: SUBUI R3, R1, #1 BNEZ R3, L2 …… L2: Branch B1 Branch B2 Correlating Branches if (d==0) d=1; if (d==1) B1 and B2 are correlated? B1 Not Taken  B2 Not Taken

  13. Initial value Value of d of d d==0? b1 before b2 d==1? b2 0 Yes Not taken 1 Yes Not taken 1 No Taken 1 Yes Not taken 2 No Taken 2 No Taken Correlating Branch Example Assume d alternates between 2 and 0. b1 b1 New b1 b2 b2 New b2 d=? prediction action prediction prediction action prediction 2 NT T T NT T T 0 T NT NT T NT NT 2 NT T T NT T T 0 T NT NT T NT NT 1-bit predictor mispredicts every branch!

  14. Correlating Branch Example Prediction if last branch Prediction bits not taken Prediction if last branch taken NT/NT Not taken Not taken NT/T Not taken Taken T/NT Taken Not Taken T/T Taken Taken Initial prediction: NT/NT b2 action b1 prediction New b1 prediction d=? b1 action b2 prediction New b2 pred NT/ NT 2 T T/NT T NT/T NT /NT 0 NT T/NT NT NT/T T/ NT NT /T 2 T /NT T T/NT NT/ T NT/T T NT NT 0 T/ NT T/NT /T NT NT/T

  15. Correlating Branches Branch address (2,2) predictor • Then behavior of recent branches selects between, say, four predictions of next branch, updating just that prediction 4 2-bit per branch predictors XX XX prediction 00 01 10 11 2-bit global branch history

More Related