180 likes | 493 Views
Branch Prediction. Define branch prediction. Draw a state machine for a 2 bit branch prediction scheme Explain the impact on the compiler of branch delay. . Control Hazards. Consider: add $t1, $zero, $zero # t1=0 beq $t1, $zero, Ifequal Notequal : addi $v0, $zero, 4 Ifequal :
E N D
Branch Prediction Define branch prediction. Draw a state machine for a 2 bit branch prediction scheme Explain the impact on the compiler of branch delay.
Control Hazards • Consider: add $t1, $zero, $zero # t1=0beq $t1, $zero, Ifequal Notequal:addi $v0, $zero, 4 Ifequal: addi $v0, $zero, 17 • Branch determines flow of control • Fetching next instruction depends on branch outcome Chapter 4 — The Processor — 2
Stall on Branch • Wait until branch outcome determined before fetching next instruction • Pipeline can’t determine next instruction until MEM stage of beq • Still working on ID stage of beq when IF should begin! add $t1, $zero, $zerobeq $t1, $zero, Ifequal • addi $v0, $zero, 4 #Notequaladdi $v0, $zero, 17 #Ifequal Next instr determined here Chapter 4 — The Processor — 3
Deciding earlier helps a little… • Extra hardware can be designed to test registers and update the PC in the ID stage • Then IF of next inst can be done one step earlier • Still have a 1-cycle stall, however add $t1, $zero, $zerobeq $t1, $zero, Ifequal • addi$v0, $zero, 4 #Notequaladdi $v0, $zero, 17 #Ifequal Next instr determined herewith extra hardware Chapter 4 — The Processor — 4
Performance penalty of stalling on branch: 17% of instructions executed in the SPECint2006 benchmark are branch instructions • If we always stalled for 1 clock cycle on a branch, what performance penalty would we have? Other instructions: CPI of 1 Branches would take 2 .83 * 1 + 2*.17 = 1.17 CPI 17% slowdown CS2710 Computer Organization
Branch Prediction • A method of resolving branch hazards that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome CS2710 Computer Organization
1-bit Dynamic Branch Prediction • One possibility is to have each branch instruction reserve a bit that retains the “history” of the last decision • 0: branch not taken • 1: branch taken • To execute a branch • Check history bit, expect the same outcome • Start fetching from fall-through (next instruction) or branch target • If wrong, flush pipeline and flip prediction bit Next actualinstr determined here add $t1, $zero, $zerobeq $t1, $zero, Ifequal addi$v0, $zero, 4 #Notequal CS2710 Computer Organization
Problems with 1-bit Dynamic Branch Prediction • Consider a loop that branches 9 times in a row, then is not taken once (end of loop condition is met) • Branch taken 9 times, not taken 1 time • At steady state • The first branch decision will be incorrect (from previous execution) • The final branch decision will be incorrect • Thus, the prediction accuracy would only be 80% CS2710 Computer Organization
2-Bit Predictor • Only change prediction on two successive mispredictions Chapter 4 — The Processor — 9
Loops and Static Branch Prediction • Consider the following loop of code • Which branch might we reliably predict? .text main: li $t0, 100 loop: addi $t0, $t0, -1 add $t0, $t0, $zero bnez $t0, loop #other instructions followhere… CS2710 Computer Organization
Example 2: Assembly while-loop .text main: li $t0, 10 loop: beqz $t0, exitLoop addi $t0, $t0, -1 add $t0, $t0, $zero j loop exitLoop: # Goto main j main Which branch is more probable? CS2710 Computer Organization
Static prediction based on code analysis (done by compiler) • Assume all branches to a previous address are always taken • Assume all branches to a subsequent address are not taken CS2710 Computer Organization
Dynamic Versus static branch prediction • Static branch prediction • Based on typical branch behavior • Example: loop and if-statement branches • Predict backward branches taken • Predict forward branches not taken • Dynamic branch prediction • Hardware measures actual branch behavior • e.g., record recent history of each branch • Assume future behavior will continue the trend • When wrong, stall while re-fetching, and update history CS2710 Computer Organization
Survey • The branch prediction methods we just discussed were examples of • Static Branch Prediction • Dynamic Branch Prediction • I haven’t a clue CS2710 Computer Organization
MIPS approach: delayed branching • Always assume “branch not taken”. • This means the instruction immediately following the branch instruction will always begin to execute • The actual decision to branch will/will not be taken until after that instruction begins to execute! • Leaves it to the compiler to insert a “useful” instruction right after the branch that would have needed to execute whether or not the branch was taken add $t1, $zero, $zerobeq $t1, $zero, Ifequal #next inst after beq!! Next actual instr determined herewith extra hardware Chapter 4 — The Processor — 15
Delayed branching examplebefore after #previous instructionsadd $s1, $s2, $s3 beq $s2, $zero, Ifequal # no-branch instructions Ifequal: # branch instructions #previous instructionsbeq $s2, $zero, Ifequal • add $s1, $s2, $s3 # no-branch instructions +Ifequal: # branch instructions • MIPS always assumes “branch not taken”, so the pipeline will automaticallybegin executing the next instruction following the beq. • The actual branch will be delayed until AFTER the next instruction executes • The compiler must help out by inserting a “useful” instruction after the beq toexecute while the branch decision is being made by the processor CS2710 Computer Organization
Delayed branching pitfallbefore not possible #previous instructionsadd $s2, $s1, $s3 beq $s2, $zero, Ifequal # no-branch instructions Ifequal: # branch instructions #previous instructionsbeq $s2, $zero, Ifequal • add $s2, $s1, $s3 # no-branch instructions +Ifequal: # branch instructions • In this case, the beq instruction depends on $s2 being up-to-date before the branching decision is made • If the compiler moves the add instruction until after beq, then $s2 will be updated too late – beq would use a “stale” value of $s2!! • The compiler in this case would have to search for a different instruction that it could insert after the beq • If no such instruction can be found (which is rare), the pipelinewill stall CS2710 Computer Organization