110 likes | 203 Views
Lecture 12: Pipelining. Computer Engineering 585 Fall 2001. 11%. compress. 3%. 3%. 22%. eqntott. 2%. 2%. 11%. espresso. 4%. 1%. 12%. gcc. 3%. 4%. 11%. li. 4%. 8%. Benchmark. 6%. doduc. 2%. 2%. 6%. ear. 4%. 4%. 10%. hydro2d. 2%. 0%. 9%. mdljdp. 0%. 0%. 2%.
E N D
Lecture 12: Pipelining Computer Engineering 585 Fall 2001
11% compress 3% 3% 22% eqntott 2% 2% 11% espresso 4% 1% 12% gcc 3% 4% 11% li 4% 8% Benchmark 6% doduc 2% 2% 6% ear 4% 4% 10% hydro2d 2% 0% 9% mdljdp 0% 0% 2% su2cor 1% 1% 0% 5% 10% 15% 20% 25% Percentage of instructions executed Forward conditional branches Backward conditional branches Unconditional branches Branch Behavior Statistics Int: 13% forward cond., 3% backward cond., 4% unconditional FP: 7% forward cond., 2% backward cond., 1% unconditional
Forward/Backward Frequency 78% 80% 70% 63% 61% 60% 53% 51% 50% 44% Fraction of all 38% 40% 37% conditional branches 35% 34% 30% 26% 25% 22% 21% 21% 20% 16% 14% 13% 8% 10% 3% 0% li ear gcc eqntott mdljdp doduc su2cor hydro2d compress espresso Benchmark Forward taken Backward taken 62% taken in int; 70% taken in FP Taken forward/backward: 67% of all conditional
Four Branch Hazard Alternatives #1: Stall until branch direction is clear #2: Predict Branch Not Taken • Execute successor instructions in sequence • “Squash” instructions in pipeline if branch actually taken • Advantage of late pipeline state update • 47% DLX branches not taken on average • PC+4 already calculated, so use it to get next instruction #3: Predict Branch Taken • 53% DLX branches taken on average • But haven’t calculated branch target address in DLX • DLX still incurs 1 cycle branch penalty • Other machines: branch target known before outcome
Four Branch Hazard Alternatives #4: Delayed Branch • Define branch to take place AFTER a following instruction branch instruction sequential successor1 sequential successor2 ........ sequential successorn branch target if taken • 1 slot delay allows proper decision and branch target address in 5 stage pipeline • DLX uses this Branch delay of length n
Branch Delay Slot/s Untaken Branch IF ID EX MEM WB BD Slot (i+1) IF ID EX MEM WB Inst (i+2) IF ID EX MEM WB Inst (i+3) IF ID EX MEM WB Inst (i+4) IF ID EX MEM WB Taken Branch IF ID EX MEM WB BD Slot (i+1) IF ID EX MEM WB Branch target IF ID EX MEM WB Branch target + 1 IF ID EX MEM WB IF ID EX MEM WB Branch target + 2 FIGURE 3.27 The behavior of a delayed branch is the same whether or not the branch is taken.
Branch Delay Slot Scheduling (a) From before (b) From target (c) From fall through ADD R1, R2, R3 ADD R1, R2, R3 SUB R4, R5, R6 if R2 = 0 then if R1 = 0 then # Delay slot Delay slot ADD R1, R2, R3 SUB R4, R5, R6 if R1 = 0 then Delay slot Becomes Becomes Becomes ADD R1, R2, R3 if R2 = 0 then if R1 = 0 then ADD R1, R2, R3 SUB R4, R5, R6 ADD R1, R2, R3 if R1 = 0 then SUB R4, R5, R6
Cancelling Branches • Commit of a branch-delay slot instruction is conditional upon the branch outcome. Untak en br anc h instr uction IF ID EX MEM WB Br anc h dela y instr uction ( i + 1) IF ID idle idle idle Instr uction i + 2 IF ID EX MEM WB Instr uction i + 3 IF ID EX MEM WB Instr uction i + 4 IF ID EX MEM WB T ak en br anc h instr uction IF ID EX MEM WB Br anc h dela y instr uction ( i + 1) IF ID EX MEM WB Br anc h tar g et IF ID EX MEM WB Br anc h tar g et + 1 IF ID EX MEM WB Br anc h tar g et + 2 IF ID EX MEM WB
Effectiveness of Branch Delay Slot Scheduling % % % % Total % conditional conditional cancelling branches branches with branches branches branches with empty or % conditional with empty that are that are cancelled cancelled Benchmark branches slots cancelling cancelled delay slots delay slot compr ess 14% 18% 31% 43% 13% 31% eqntott 24% 24% 50% 24% 12% 36% espr esso 15% 29% 19% 21% 4% 33% gcc 15% 16% 33% 34% 11% 27% li 15% 20% 55% 48% 26% 46% Inte g er a v er a g e 17% 21% 38% 34% 13% 35% doduc 8% 33% 12% 62% 8% 41% ear 10% 37% 36% 14% 5% 42% h ydr o2d 12% 0% 69% 24% 16% 17% mdljdp2 9% 0% 86% 10% 8% 8% su2cor 3% 7% 17% 57% 10% 17% FP a v er a g e 8% 16% 44% 34% 9% 25% Ov er all a v er a g e 12% 18% 41% 34% 11% 30%
Effectiveness of Branch Delay Slot Scheduling 50% 45% 40% 35% 30% Percentage of# 25% conditional branches 20% 15% 10% 5% 0% li ear gcc mdljdp doduc su2cor eqntott hydro2d compress espresso Benchmark Empty slot Canceled delay slots
Delayed Branch (Summary) • Where to get instructions to fill branch delay slot? • Before branch instruction • From the target address: only valuable when branch taken • From fall through: only valuable when branch not taken • Cancelling branches allow more slots to be filled • Compiler effectiveness for single branch delay slot: • Fills about 60% of branch delay slots • About 80% of instructions executed in branch delay slots useful in computation • About 50% (60% x 80%) of slots usefully filled • Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)