350 likes | 514 Views
Csci 136 Computer Architecture II – Branch Hazards, Exceptions. Xiuzhen Cheng cheng@gwu.edu. Announcement. Homework assignment # 10 , Due time – Before class, April 12 Readings: Sections 6.4 – 6.5
E N D
Csci 136 Computer Architecture II – Branch Hazards, Exceptions Xiuzhen Cheng cheng@gwu.edu
Announcement • Homework assignment #10, Due time – Before class, April 12 • Readings: Sections 6.4 – 6.5 • Problems: 6.17-6.19, 6.21-6.22, 6.33-6.36, 6.39-6.40 (six of them will be graded. Your TA will give hints in the lab sections.) • Project #3 is due on April 10, 2005 • Quiz #4: April 12, 2005 • Final: Thursday, May 12, 12:40AM-2:40PM Note: you must pass final to pass this course!
Review on Data Hazards, Forwarding, Stall • When does a data hazard happen? • Data dependencies • Using forwarding to overcome data hazards • Data is available after ALU stage • Forwarding conditions • Stall the pipeline for load-use instructions • Data is available after MEM stage (lw instruction) • Hazard detection conditions • Why in ID stage?
Review on Data Hazards, Forwarding, Stall PC+4 Sign-extend
Sign-Ext LW and SW • lw $5, 0($15)sw $5, 100($15) • lw $5, 0($15)beq $5, $0, Exitsw $5, 100($15) • lw $5, 0($15)add $8, $8, $8sw $5, 100($15)
SW is in MEM Stage MEM/WB.RegWrite and EX/MEM.MemWrite and MEM/WB.RegisterRd = EX/MEM.RegisterRdand MEM/WB.RegisterRD != 0 sw lw Sign-Ext • lw $5, 0($15)sw $5, 100($15) EX/MEM Data memory
SW is In EX Stage sw ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRd = ID/EX.RegisterRt and MEM/WB.RegisterRd != 0 lw Sign-Ext
More Cases • lw $15, 0($8) # load-use,sw $5, 100($15) # stall pipeline • R-Type followed by sw? • The result from R-Type will be saved into memory • R-Type will overwrite base register for sw
An Example 40:lw $2, 20($1) 44: and $4, $2, $5 48: or $8, $2, $4 • Clock Cycle 1: • Clock Cycle 2: • Clock Cycle 3: • Clock Cycle 4:
Clock 1 Lw $2, 20($1) 44 PC+4 Sign-extend Clock 1
Clock 2 Lw $2, 20($1) And $4, $2, $5 11 010 0001 44 48 PC+4 $1 20 Sign-extend 1 2 2 Clock 2
Clock 3 And $4, $2, $5 Or $8, $2, $4 Lw $2, 20($1) 10 11 010 000 1100 44 52 PC+4 $1 $2 $5 20 Sign-extend 1 2 2 5 2 5 4 Clock 3
Clock 4 And $4, $2, $5 Or $8, $2, $4 Lw $2, 20($1) Bubble 10 00 000 11 000 1100 44 52 PC+4 $2 $5 Sign-extend 2 5 5 4 Clock 4
Clock 5 And $4, $2, $5 Or $8, $2, $4 Lw $2, 20($1) Bubble 10 10 000 00 000 11 1100 44 PC+4 $2 $2 $4 $5 Sign-extend 2 2 5 4 5 2 4 4 8 4 Clock 5
Branch Hazards Control hazard: attempt to make a decision before condition is evaluated
Decision is made here flush flush flush Branch Hazards
Observations • Branch decision does not occur until MEM stage; 3 CCs are wasted. – Current design, non-optimized • Is it possible to reduce branch delay? • YES • In EXE stage? • Two CCs branch delay • In ID Stage? • One CC branch delay • How? – for beq $x, $y, label, $x xor $y then or all bits, much faster than ALU operation. Also we have a separate ALU to compute branch address. • 3 strategies • Delayed branch; Static branch prediction; Dynamic branch Prediction
Delayed Branch • Will always execute the instruction following the branch. • Only one will be executed • Done by compiler or assembler • 50% successful rate • Losing popularity • Why? • More pipeline stages • Superscalar
Scheduling the Branch Delay Slot Independent instruction, best choice B is good when branch taking probability is high. It must be OK to execute the sub instruction when the branch goes to the unexpected direction
Static Branch Prediction • Assume the branch will not be taken; If prediction is wrong, clear the effect of sequential instruction execution. • How to discard instructions in the pipeline? • Branch decision is made at MEM stage: instructions in IF, ID, EX stages need to be discarded. • Branch decision is made at ID stage: only flush IF/ID pipeline register!
Decision is made here flush flush flush Static Branch Prediction
Static Branch Prediction IF.Flush
Pipelined Branch – An Example 44: 40: 36: 28 44 72 $4 $8 10 IF.Flush
taken Not taken Prediction Taken Prediction Taken taken Not taken taken taken Predictionnot Taken Prediction not Taken Not taken Not taken Dynamic Branch Prediction • Static branch prediction is crude! • Take history into consideration • If a branch was taken last time, then fetching the new instruction from the same place • Branch prediction buffer – indexed by the lower bits of the branch instruction • This memory contains a bit (or bits) which tells whether the branch was recently taken or not • Is the prediction correct? Any bad effect? • 1-bit prediction scheme • 2-bit prediction scheme
Observation • Since we move branch prediction to the ID stage, we need to copy forwarding control related hardware to the ID stage too! • Beq following lw • Hazard detection unit should work.
taken Not taken Prediction Taken Prediction Taken taken Not taken taken taken Predictionnot Taken Prediction not Taken Not taken Not taken In-Class Exercise • Consider a loop branch that branches nine times in a row, then is not taken once. What is the prediction accuracy for this branch, assuming the prediction bit for this branch remains in the prediction buffer? • 1-bit prediction? • With 2-bit prediction?
Performance Comparision • Compare the performance of single-cycle, multi-cycle and pipelined datapath • 200ps for memory access, 100ps for ALU operation, 50ps for register file access • 25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU ops • For piplelined datapath, • 50% of load are immediately followed an instruction that uses the result • Branch delay on misprediction is 1 clock cycle and 25% branches are mispredicted • Jump delay is 1 clock cycle
Exceptions • Exceptions: events other than branch or jump that change the normal flow of instruction • Arithmetic overflow, undefined instruction, etc • Internal of the processor • Interrupts from external – IO interrupts • Use arithmetic overflow as an example • When an overflow is detected, we need to transfer control to the exception handling routine at location 0x 8000 0180 immediately because we do not want this invalid value to contaminate other registers or memory locations • Similar idea as branch hazard • Detected in the EX stage • De-assert all control signals in EX and ID stages, flush IF/ID
Exceptions 80000180
Example sub $11, $2, $4 and $12, $2, $5 or $13, $2, $6 add $1, $2, $1 -- overflow occurs slt $15, $6, $7 lw $16, 50($7) Exceptions handling routine: 0x 8000 0180 sw $25, 1000($0) 0x 8000 0184 sw $26, 1004($0)
Example 80000180 Clock 6
Example 80000180 Clock 7