90 likes | 315 Views
Dynamic Branch Prediction (Sec 4.3). Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction schemes Here, we talk about using hardware to dynamically predict branch outcome. The effectiveness of a branch prediction scheme depends on
E N D
Dynamic Branch Prediction (Sec 4.3) • Control dependences become a limiting factor in exploiting ILP • So far, we’ve discussed only static branch prediction schemes • Here, we talk about using hardware to dynamically predict branch outcome. • The effectiveness of a branch prediction scheme depends on • Its accuracy of prediction • Its cost when the prediction is correct and when it is incorrect.
Branch Prediction Buffer • In its simplest form, a memory contains a bit, called prediction bit, saying whether the branch was recently taken or not • The memory is indexed by the lower portion of the address of the branch instruction • The fetching begins in the predicted direction • If the prediction is wrong, the prediction bit is inverted • The simple one-bit scheme has performance shortcomings (Example on page 263)
Branch Prediction Buffer (Cont’d) • Two-bit prediction schemes track the previous two consecutive branches to change the prediction (Fig. 4.13) • An n-bit predictor can have an n-bit counter, and a branch prediction can depend on its value • The branch prediction buffer is accessed during the IF stage • If the instruction is decoded as branch, the next fetch is based on the prediction • See Figure 4.14 to see the prediction accuracy • Prediction accuracy becomes more important in programs with high branch frequency • We may improve prediction accuracy if we also look at the recent behavior of other branches
Branch Prediction Buffer (Cont’d) • Consider the following code fragment: If (aa = = 2)aa = 0; If (bb = = 2)bb =0; If (aa ! = bb) { • DLX code for the above is SUBI R3, R1, #2 BNEZ R3, L1 ;branch b1 (aa !=2) ADD R1, R0, R0 ;aa = = 0 L1: SUBI R3, R2, #2 BNEZ R3, L2 ;branch b2 (bb!=2) ADD R2, R0, R0 ;bb= = 0 L2: SUB R3, R1, R2 ; R3= aa - bb BEQZ R3, L3 ;branch b3 (aa = = bb) • b3 behavior is correlated with the behavior of b1 & b2
Correlating Branch Predictors • Consider the code: If (d = = 0) d = 1; If(d = = 1) • The instruction sequence generated as follows: BNEZ R1, L1 ;b1 (d != 0) ADDI R1, R0, #1 ;d = = 0 so d = 1 L1: SUBI R3, R1, #1 BNEZ R3, L2 ;branch b2 (d != 1) L2: • See Figures 4.26, 4.17, 4.18 and 4.19
Correlating Branch Predictors (cont’d.) • (m, n) predictor (Figure 4.20) • Uses the behavior of last ‘m’ branches (global history) • N-bit predictor for a branch • 2m branch predictors to choose from • Global history can be recorded as an n-bit shift register • Concatenate low order bits prove the branch address with m-bit global history (see figure 4.20)
Branch Target Buffers • A branch target buffer stores the predicted address for the next instruction • The intent is to know the branch target address at the end of the IF stage (see Fig. 4.22) • We access the buffer during the IF stage • If we get a bit, we fetch the next instruction for the predicted PC value • If there is no match, proceed normally • A branch predictor field can also be added for extra prediction • See Fig. 4.23, Fig 4.24, Do example on page 274
Multiple–Issue Processors • So for, we tried to achieve the ideal CPI of 1 • How can we improve performance further, to achieve CPI < 1? • Multiple-issue processors are used to improve performance further • Superscalar processor: • Issue varying numbers of instructions per clock • Could be statically scheduled (Sun Ultra SPARC II/III) • Or dynamically scheduled (Pentium III/4, MIPSR 10k) • VLIW (Very Large Instruction World) processors • Fixed number of instructions per clock • Statically scheduled by the compiler (Trimedia, 1860, Itanium)
Superscalar Processors • A superscalar processor has dynamic issue capability • The hardware may issue from one to eight instruction in a clock cycle • Usually the instructions are independent and/or follow certain constraints, such as memory access, etc. • If there is a dependency or structural hazard in an instruction, only the preceding instructions are issued