CS433: Computer System Organization

CS433: Computer System Organization Luddy Harrison Lecture 15 Branch Prediction

1-bit Branch Prediction Buffer Predict:If BPB entry is 0, fetch PC+1If BPB entry is 1, fetch L Update:If branch is taken, BPB := 1If branch is not taken, BPB := 0

State Diagram of 1-bit Predictor

Twice Mispredicted Loop Branches M: ADD R1, R2, R3 L: ADD R4, R5, R6 MUL R7, R8, R9 SUB R11, R11, #1BNE L SUB R10, R10, #1 BNE M

Sequence of Predictions

2-bit Predictor • Add some “stickiness” or “memory” to the predictor • cause it to move more slowly from one prediction state (predict taken vs. predict not taken) • Another bit of state does quite a lot

2-Bit Predictor

State Diagram of 2-Bit Predictor

Prediction Accuracy: 12-bit index + 2-bit state

12-bit index + 2-bit state vs infinite buffer (2-bit state)

More State Bits? • Increasing the number of state bits beyond 2 does not seem to help much. • Increasing the number of state bits too much will cause the predictor to be stuck in an incorrect state for branches that change their tendency to branch during execution • YYYYYYYNNNNNNNNNYYYYYYYYNNNNNN

Applying the Prediction • The earliest time we can begin using the prediction is when • the prediction bits are available • the branch target is available • The earliest time we can know whether we have predicted correctly is when • the branch condition is resolved • The difference between these times is roughly what is saved by a correct prediction • If the branch target is available late, the window of savings is reduced

Correlating Predictors • The prediction is a function of the last k branch outcomes • The branch history buffer is indexed by • m bits taken from address of branch • k bits of branch history • i.e., m + k bits all told • The branch history buffer has 2m+k

Correlating Predictors • The prediction is a function of the last k branch outcomes • The branch history buffer is indexed by • m bits taken from address of branch • k bits of branch history • i.e., m + k bits all told • Each entry in the branch history buffer has q bits (i.e., is a q-bit predictor) • The branch history buffer has 2m+k q bits of storage

Correlating predictor with2 history bits and 2 state bits (2,2)

Comparison of 2-bit predictors

Local versus Global

Hashing Correlation For the same amount of table storage, we can get better associativity in the case of fewer branches but highly correlated behavior.

Tournament Predictor • Move “toward” the other predictor when • I am wrong • He is right • Stay put when I am right and he is right, or I am wrong and he is wrong.

Tournament predictor local vs global

Local 2-bit vs. Correlating vs. Tournament

Alpha 21264 Branch Predictor • Tournament predictor (4K x 2) chooses between global and local • Global has 4K 2-bit entries indexed by last 12 branch outcomes XORed with address • Local is also a two-level predictor • 1K x 10 branch history buffer (last 10 outcomes for indexed branch) indexed by address • The selected 10-bit history is XORed with address to index a table of 3-bit entries

Alpha 21264 Predictor

Branch Target Buffer • Contains an entry for each branch that is predicted taken • Indexed by PC of (potential) branch • If not in table, it is taken to mean • either not a branch • or not predicted taken • in either case, continue fetching from PC + k • BTB gets us the branch target address early

Branch Target Buffer

BTB Handling State Chart

Questions Concerning BTBs • Can BTB be combined with branch prediction machinery introduced earlier in this lecture? How? • What kind of branches can a BTB accelerate that are out of the reach of ordinary branch predictors?

CS433: Computer System Organization