1 / 35

Csci 136 Computer Architecture II – Branch Hazards, Exceptions

Csci 136 Computer Architecture II – Branch Hazards, Exceptions. Xiuzhen Cheng cheng@gwu.edu. Announcement. Homework assignment # 10 , Due time – Before class, April 12 Readings: Sections 6.4 – 6.5

alanna
Download Presentation

Csci 136 Computer Architecture II – Branch Hazards, Exceptions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Csci 136 Computer Architecture II – Branch Hazards, Exceptions Xiuzhen Cheng cheng@gwu.edu

  2. Announcement • Homework assignment #10, Due time – Before class, April 12 • Readings: Sections 6.4 – 6.5 • Problems: 6.17-6.19, 6.21-6.22, 6.33-6.36, 6.39-6.40 (six of them will be graded. Your TA will give hints in the lab sections.) • Project #3 is due on April 10, 2005 • Quiz #4: April 12, 2005 • Final: Thursday, May 12, 12:40AM-2:40PM Note: you must pass final to pass this course!

  3. Review on Data Hazards, Forwarding, Stall • When does a data hazard happen? • Data dependencies • Using forwarding to overcome data hazards • Data is available after ALU stage • Forwarding conditions • Stall the pipeline for load-use instructions • Data is available after MEM stage (lw instruction) • Hazard detection conditions • Why in ID stage?

  4. Review on Data Hazards

  5. Review on Data Hazards, Forwarding, Stall PC+4 Sign-extend

  6. Sign-Ext LW and SW • lw $5, 0($15)sw $5, 100($15) • lw $5, 0($15)beq $5, $0, Exitsw $5, 100($15) • lw $5, 0($15)add $8, $8, $8sw $5, 100($15)

  7. SW is in MEM Stage MEM/WB.RegWrite and EX/MEM.MemWrite and MEM/WB.RegisterRd = EX/MEM.RegisterRdand MEM/WB.RegisterRD != 0 sw lw Sign-Ext • lw $5, 0($15)sw $5, 100($15) EX/MEM Data memory

  8. SW is In EX Stage sw ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRd = ID/EX.RegisterRt and MEM/WB.RegisterRd != 0 lw Sign-Ext

  9. More Cases • lw $15, 0($8) # load-use,sw $5, 100($15) # stall pipeline • R-Type followed by sw? • The result from R-Type will be saved into memory • R-Type will overwrite base register for sw

  10. An Example 40:lw $2, 20($1) 44: and $4, $2, $5 48: or $8, $2, $4 • Clock Cycle 1: • Clock Cycle 2: • Clock Cycle 3: • Clock Cycle 4:

  11. Clock 1 Lw $2, 20($1) 44 PC+4 Sign-extend Clock 1

  12. Clock 2 Lw $2, 20($1) And $4, $2, $5 11 010 0001 44 48 PC+4 $1 20 Sign-extend 1 2 2 Clock 2

  13. Clock 3 And $4, $2, $5 Or $8, $2, $4 Lw $2, 20($1) 10 11 010 000 1100 44 52 PC+4 $1 $2 $5 20 Sign-extend 1 2 2 5 2 5 4 Clock 3

  14. Clock 4 And $4, $2, $5 Or $8, $2, $4 Lw $2, 20($1) Bubble 10 00 000 11 000 1100 44 52 PC+4 $2 $5 Sign-extend 2 5 5 4 Clock 4

  15. Clock 5 And $4, $2, $5 Or $8, $2, $4 Lw $2, 20($1) Bubble 10 10 000 00 000 11 1100 44 PC+4 $2 $2 $4 $5 Sign-extend 2 2 5 4 5 2 4 4 8 4 Clock 5

  16. Branch Hazards Control hazard: attempt to make a decision before condition is evaluated

  17. Decision is made here flush flush flush Branch Hazards

  18. Observations • Branch decision does not occur until MEM stage; 3 CCs are wasted. – Current design, non-optimized • Is it possible to reduce branch delay? • YES • In EXE stage? • Two CCs branch delay • In ID Stage? • One CC branch delay • How? – for beq $x, $y, label, $x xor $y then or all bits, much faster than ALU operation. Also we have a separate ALU to compute branch address. • 3 strategies • Delayed branch; Static branch prediction; Dynamic branch Prediction

  19. Delayed Branch • Will always execute the instruction following the branch. • Only one will be executed • Done by compiler or assembler • 50% successful rate • Losing popularity • Why? • More pipeline stages • Superscalar

  20. Scheduling the Branch Delay Slot Independent instruction, best choice B is good when branch taking probability is high. It must be OK to execute the sub instruction when the branch goes to the unexpected direction

  21. Static Branch Prediction • Assume the branch will not be taken; If prediction is wrong, clear the effect of sequential instruction execution. • How to discard instructions in the pipeline? • Branch decision is made at MEM stage: instructions in IF, ID, EX stages need to be discarded. • Branch decision is made at ID stage: only flush IF/ID pipeline register!

  22. Decision is made here flush flush flush Static Branch Prediction

  23. Static Branch Prediction IF.Flush

  24. Pipelined Branch – An Example 44: 40: 36: 28 44 72 $4 $8 10 IF.Flush

  25. Pipelined Branch – An Example 72:

  26. taken Not taken Prediction Taken Prediction Taken taken Not taken taken taken Predictionnot Taken Prediction not Taken Not taken Not taken Dynamic Branch Prediction • Static branch prediction is crude! • Take history into consideration • If a branch was taken last time, then fetching the new instruction from the same place • Branch prediction buffer – indexed by the lower bits of the branch instruction • This memory contains a bit (or bits) which tells whether the branch was recently taken or not • Is the prediction correct? Any bad effect? • 1-bit prediction scheme • 2-bit prediction scheme

  27. Observation • Since we move branch prediction to the ID stage, we need to copy forwarding control related hardware to the ID stage too! • Beq following lw • Hazard detection unit should work.

  28. taken Not taken Prediction Taken Prediction Taken taken Not taken taken taken Predictionnot Taken Prediction not Taken Not taken Not taken In-Class Exercise • Consider a loop branch that branches nine times in a row, then is not taken once. What is the prediction accuracy for this branch, assuming the prediction bit for this branch remains in the prediction buffer? • 1-bit prediction? • With 2-bit prediction?

  29. Performance Comparision • Compare the performance of single-cycle, multi-cycle and pipelined datapath • 200ps for memory access, 100ps for ALU operation, 50ps for register file access • 25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU ops • For piplelined datapath, • 50% of load are immediately followed an instruction that uses the result • Branch delay on misprediction is 1 clock cycle and 25% branches are mispredicted • Jump delay is 1 clock cycle

  30. Exceptions • Exceptions: events other than branch or jump that change the normal flow of instruction • Arithmetic overflow, undefined instruction, etc • Internal of the processor • Interrupts from external – IO interrupts • Use arithmetic overflow as an example • When an overflow is detected, we need to transfer control to the exception handling routine at location 0x 8000 0180 immediately because we do not want this invalid value to contaminate other registers or memory locations • Similar idea as branch hazard • Detected in the EX stage • De-assert all control signals in EX and ID stages, flush IF/ID

  31. Exceptions 80000180

  32. Example sub $11, $2, $4 and $12, $2, $5 or $13, $2, $6 add $1, $2, $1 -- overflow occurs slt $15, $6, $7 lw $16, 50($7) Exceptions handling routine: 0x 8000 0180 sw $25, 1000($0) 0x 8000 0184 sw $26, 1004($0)

  33. Example 80000180 Clock 6

  34. Example 80000180 Clock 7

  35. Questions?

More Related