1 / 25

EECS 470

EECS 470. Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A. Pipeline function for BEQ. Fetch: read instruction from memory Decode: read source operands from reg Execute: calculate target address and test for equality

amora
Download Presentation

EECS 470

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A

  2. Pipeline function for BEQ • Fetch: read instruction from memory • Decode: read source operands from reg • Execute: calculate target address and test for equality • Memory: Send target to PC if test is equal • Writeback: Nothing left to do

  3. Control Hazards beq 1 1 10 sub 3 4 5 time beq fetch decode execute memorywriteback sub fetch decode execute

  4. Approaches to handling control hazards • Avoidance • Make sure there are no hazards in the code • Detect and Stall • Delay fetch until branch resolved. • Speculate and Squash if wrong • Go ahead and fetch more instruction in case it is correct, but stop them if they shouldn’t have been executed

  5. Handling branch hazards: avoid all hazards • Don’t have branch instructions! • Maybe a little impractical  • Predication can eliminate some branches • If-conversion • Hyperblocks

  6. if-conversion if (a == b) { x++; y = n / d; } sub t1  a, b jnz t1, PC+2 add x  x, #1 div y  n, d sub t1  a, b add t2  x, #1 div t3  n, d cmov(t1) x  t2 cmov(t1) y  t3 sub t1  a, b add(t1) x  x, #1 div(t1) y  n, d

  7. Removing hazards by refining a branch instruction • Redefine branch instructions: ptbeq regA regB offset prepare to branch if equal If (R[regA] = = R[regB]) execute instructions at PC+1, PC+2, PC+3 then PC+1+offset

  8. ptbnz example g = c + 2 bnz g, PC + 4 t = 5 n = 7 noop m = 5 a = 3 t = 5 n = 7 g = c + 2 bnz g, PC + 1 m = 5 a = 3

  9. Problems with this solution • Old programs (legacy code) may not run correctly on new implementations • Longer pipelines tend to need more noops • Programs get larger as noops are included • Especially a problem for machines that try to execute more than one instruction every cycle • Harder to find useful instructions • Program execution is slower • CPI is one, but some I’s are noops

  10. Handling control hazards: detect and stall • Detection: • Must wait until decode • Compare opcode to beq or jalr • Alternately, this is just another control signal • Stall: • Keep current instructions in fetch • Pass noop to decode stage (not execute!)

  11. + + A L U M U X 1 REG file M U X PC Inst mem Data memory M U X sign ext Control bnz r1 IF/ ID ID/ EX EX/ Mem Mem/ WB

  12. M U X + + A L U M U X 1 REG file M U X PC Inst mem Data memory M U X sign ext noop Control IF/ ID ID/ EX EX/ Mem Mem/ WB

  13. fetch or fetch Target: Control Hazards beq 1 1 10 sub 3 4 5 time beq fetch decode execute memorywriteback sub fetch fetchfetch

  14. Problems with detect and stall • CPI increases every time a branch is detected! • Is that necessary? Not always! • Only about ½ of the time is the branch taken • Let’s assume that it is NOT taken… • In this case, we can ignore the beq (treat it like a noop) • Keep fetching PC + 1 • What if we are wrong? • OK, as long as we do not COMPLETE any instructions we mistakenly executed (i.e. don’t perform writeback)

  15. Handling data hazards: speculate and squash • Speculate: assume not equal • Keep fetching from PC+1 until we know that the branch is really taken • Squash: stop bad instructions if taken • Send a noop to: • Decode, Execute and Memory • Send target address to PC

  16. M U X + + noop A L U noop noop M U X 1 equal REG file M U X PC Inst mem Data memory add M U X sign ext beq sub add nand Control sub beq beq IF/ ID ID/ EX EX/ Mem Mem/ WB

  17. Problems with fetching PC+1 • CPI increases every time a branch is taken! • About ½ of the time • Is that necessary? No!, but how can you fetch from the target before you even know the previous instruction is a branch – much less whether it is taken???

  18. M U X + + target eq? A L U M U X 1 REG file M U X PC Inst mem Data memory M U X sign ext bpc target Control IF/ ID ID/ EX EX/ Mem Mem/ WB beq

  19. Branch Target Buffer Fetch PC Send PC to BTB found? No Yes use target use PC+1 Predicted target PC

  20. Branch prediction • Predict not taken: ~50% accurate • No BTB needed; always use PC+1 • Predict backward taken: ~65% accurate • BTB holds targets for backward branches (loops) • Predict same as last time: ~80% accurate • Update BTB for any taken branch

  21. What about indirect branches? • Could use same approach • PC+1 unlikely indirect target • Indirect jumps often have multiple targets (for same instruction) • Switch statements • Virtual function calls • Shared library (DLL) calls

  22. Indirect jump: Special Case • Return address stack • Function returns have deterministic behavior (usually) • Return to different locations (BTB doesn’t work well) • Return location known ahead of time • In some register at the time of the call • Build a specialize structure for return addresses • Call instructions write return address to R31 AND RAS • Return instructions pop predicted target off stack • Issues: finite size (save or forget on overflow?); • Issues: long jumps (clear when wrong?)

  23. Branch prediction • Pentium: ~85% accurate • Pentium Pro: ~92% accurate • Best paper designs: ~96% accurate

  24. Costs of branch prediction/speculation • Performance costs? • Minimal: no difference between waiting and squashing; and it is a huge gain when prediction is correct! • Power? • Large: in very long/wide pipelines many instructions can be squashed • Squashed = # mispredictions  pipeline length/width before target resolved • Area? • Can be large: predictors can get very big as we will see next time • Complexity? • Designs are more complex • Testing becomes more difficult

  25. What else can be speculated? • Dependencies • I think this data is coming from that store instruction) • Values • I think I will load a 0 value • Accuracy? • Branch prediction (direction) is Boolean (T,NT) • Branch targets are stable or predictable (RAS) • Dependencies are limited • Values cover a huge space (0 – 4B)

More Related