520 likes | 589 Views
Learn about managing data hazards in ALU instructions, including strategies like data forwarding and pipeline stall techniques in computer organization and assembly level programming.
E N D
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Chapter 4 The Processor Zhao Zhang Iowa State University Revised from original slides provided by MKP
Week 10 Overview • Expected project progress: Complete Mini-Project B, part 1 • ALU data hazard and forwarding • MEM data hazard, forwarding, and pipeline stall • Control hazard and branch execution Chapter 1 — Computer Abstractions and Technology — 2
Data Hazards from ALU Instructions • An instruction depends on completion of data access by a previous instruction • add $s0, $t0, $t1sub $t2, $s0, $t3 • Consider this sequence: sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2) Chapter 4 — The Processor — 3
Data Hazards from ALU Instructions • A naïve approach is to insert nops to wait out the dependence • add $s0, $t0, $t1sub $t2, $s0, $t3 Change to • add $s0, $t0, $t1noopnoopsub $t2, $s0, $t3 Chapter 4 — The Processor — 4
Data Hazards in ALU Instructions • Another naïve approach is to stall the 2nd instruction in the dependence • add $s0, $t0, $t1sub $t2, $s0, $t3 Chapter 4 — The Processor — 5
Data Hazards in ALU Instructions • Observations on this scenario • The first, ALU instruction produces a register value • The following instruction(s) consumes the register value sub $2, $1,$3and $12,$2,$5or $13,$6,$2 add $14,$2,$2sw $15,100($2) Chapter 1 — Computer Abstractions and Technology — 6
Data Hazards in ALU Instructions • What is exactly the problem? • A register value is written to the register file in the WBstage, two cycles after the EX stage • The following instructions read the register value in the beginning of the IDstage IF ID EX MEM WB and sub … … … sub reads $1, $3 or and sub … … AND reads old $2 add or and sub … OR reads old $2 sub writes to $2 add reads new $2 sw add or and sub Chapter 1 — Computer Abstractions and Technology — 7
Data Hazards in ALU Instructions Chapter 1 — Computer Abstractions and Technology — 8
Forwarding (aka Bypassing) • Use result when it is computed • The result is already in the pipeline • Don’t wait for it to be stored in a register • Requires extra connections in the datapath Chapter 4 — The Processor — 9
Dependencies & Forwarding Chapter 4 — The Processor — 10
Data Forwarding • To what place: The two ALU inputs in the EX stage datapath • Forwarded register value may replace the values from ID • From where: The destination register value in pipeline registers • Source 1: EX/MEM register • Source 2: MEM/WB register Chapter 1 — Computer Abstractions and Technology — 11
Data Forwarding • When to forward: Data dependence detected between • Instructions at the EX and MEM stage • Instructions at the EX and WB stage • How to detect: Compare source and destination register numbers Chapter 1 — Computer Abstractions and Technology — 12
Data Forwarding • Example sub $2, $1,$3 # MEM=>EX forwarding and $12,$2,$5 # WB =>EX forwarding or $13,$6,$2 add $14,$2,$2 sw$15,100($2) • IF IDEX MEM WB or and sub … … AND gets forwarded new $2 value add or and sub … sw add or and sub SUB gets forwardednew $2 value Chapter 1 — Computer Abstractions and Technology — 13
Data Forwarding Logic Design sub $2, $1,$3 # and $12,$2,$5 # comp $2 with $2, $5 or $13,$6,$2 # comp $2 with $6, $2 • Detection: Comparers and rt at EX, with rd at MEM and rd at WB • Those register numbers are in the IE/EX, EX/MEM, and MEM/WB registers • rs was not in IE/EX register, we can add it Chapter 1 — Computer Abstractions and Technology — 14
Data Forwarding Logic Design • Register numbers in pipeline • Source registers of the instruction at the EX stage ID/EX.RegisterRs, ID/EX.RegisterRt • Destination register of the instruction at the MEM stage EX/MEM.RegisterRd • Destination register of the instruction at WB stage MEM/WB.RegisterRd • Potential data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd fromEX/MEMpipeline reg Fwd fromMEM/WBpipeline reg Chapter 4 — The Processor — 15
Data Forwarding Logic Design • But only if forwarding instruction will write to a register! • EX/MEM.RegWrite=1, MEM/WB.RegWrite=1 • It’s possible an instruction has a matching rd but doesn’t write to register • And only if Rd for that instruction is not $zero • EX/MEM.RegisterRd ≠ 0,MEM/WB.RegisterRd ≠ 0 • It’s allowed for an instruction to write to $0 Chapter 4 — The Processor — 16
Forwarding Paths The forwarding unit accesses three pipeline registers Note rs is added to IE/EX pipeline register Chapter 4 — The Processor — 17
Forwarding Conditions • EX hazard: Data forwarding from EX/MEM register • if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs))ForwardA = 10 • if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt))ForwardB = 10 • MEM hazard: Data forwarding from MEM/WB register • if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs))ForwardA = 01 • if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))ForwardB = 01 This is not the final version (see slides 23) Chapter 4 — The Processor — 18
Datapath with Forwarding Chapter 4 — The Processor — 19
Caveats • Data forwarding happens in the beginning of the cycle • The forwarding unit is in the EX stage, with its inputs from three pipeline stages • A small overhead added to the critical path latency of the EX stage • For EX hazard, data forwarding is from MEM to EX • Precisely, the register value of the instruction being executed at the MEM stage is forwarded to the instruction being executed at the EX stage Chapter 1 — Computer Abstractions and Technology — 20
Caveats • For MEM hazard, the forwarding is from the WB to EX • From the instruction at WB to the instruction at EX • Data forwarding is to EX not to ID • An instruction may read obsolete register values at ID, with the values latched at ID/EX register • The correct values may be at EX (EX Hazard) or MEM (MEM Hazard) • Any obsolete values get replaced at EX • There is no WB hazard • Register write at WB and register read at ID, for the same register, may complete within one cycle Chapter 1 — Computer Abstractions and Technology — 21
Double Data Hazard • Consider the sequence: add $1,$1,$2add $1,$1,$3add $1,$1,$4 • Both hazards occur • Want to use the most recent • Revise MEM hazard condition • Only fwd if EX hazard condition isn’t true Chapter 4 — The Processor — 22
Revised Forwarding Condition • MEM hazard (revision from slide 18) • if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs))ForwardA = 01 • if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))ForwardB = 01 Chapter 4 — The Processor — 23
Load-Use Data Hazard • Load-use data hazard: A load instruction is followed immediately by an instruction using the value of load • How is a load instruction different from an ALU instruction? • ALU inst: destination register value available at the end of the EX stage • Load inst: destination register value available at the end of the MEM stage • Note the next instruction may need the value in the beginning of its EX stage Chapter 1 — Computer Abstractions and Technology — 24
Load-Use Data Hazard • Can’t always avoid stalls by forwarding • If value not computed when needed • Can’t forward backward in time! Chapter 4 — The Processor — 25
Load-Use Data Hazard Need to stall for one cycle Chapter 4 — The Processor — 26
Load-Use Hazard How to insert a pipeline bubble (lost cycle)? lw $2, 20($1) sub $4, $2, $5 or $8, $2, $6 When the load instruction is at the EX stage • Hold the instruction at the IF stage • Do not update the PC • Hold the instruction at the ID stage • Do not change the IF/ID register • Insert a nopat the EX stage • Make all control signals in ID/EX register to zero • Particularly, RegWrite = 0 and MemWrite = 0 • Move forward MEM and WB Chapter 1 — Computer Abstractions and Technology — 27
Load-Use Hazard Detection To detect, check if • Aload instruction is at the EX stage • ID/EX.MemRead= 1 • The instruction at the ID stage reads the register value of load • ID/EX.RegisterRt = IF/ID.RegisterRs, orID/EX.RegisterRt = IF/ID.RegisterRt (for R-type) If detected, stall IF and ID, insert bubble at EX, move forward MEM and MB Chapter 4 — The Processor — 28
Pipeline Stall • The nop has all control signals set to zero • It does nothing at EX, MEM and WB • Prevent update of PC and IF/ID register • Using instruction is decoded again (OK) • Following instruction is fetched again (OK) • 1-cycle stall allows MEM to read data for lw • Can subsequently forward from WB to EX • Need to add new control lines • PCWrite for holding or updating PC • IF/IDWrite for holding or update IF/ID register Chapter 4 — The Processor — 29
Stall/Bubble in the Pipeline Stall inserted here Chapter 4 — The Processor — 30
Stall/Bubble in the Pipeline Or, more accurately… Chapter 4 — The Processor — 31
Datapath with Hazard Detection Chapter 4 — The Processor — 32
Stalls and Performance • Stalls reduce performance • But are required to get correct results • Compiler can arrange code to avoid hazards and stalls • Requires knowledge of the pipeline structure The BIG Picture Chapter 4 — The Processor — 33
Code Scheduling to Avoid Stalls • Reorder code to avoid use of load result in the next instruction • C code for A = B + E; C = B + F; lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles Chapter 4 — The Processor — 34
Control Hazards • Branch determines flow of control • Two branch outcomes: Taken or Not-Taken • Fetching next instruction depends on branch outcome • Pipeline can’t always fetch correct instruction • Still working on ID stage of branch • In MIPS pipeline • Need to compare registers and compute target early in the pipeline • Add hardware to do it in ID stage Chapter 4 — The Processor — 35
Control Hazards Several caveats • The CPU doesn’t recognize a branch until it reaches the end of the ID stage • Every cycle, the CPU has to fetch one instruction • Cannot afford to wait and see • Must predict the next PC every cycle • The CPU may predict “always not-taken” (MIPS 5-stage pipeline) • Alternatively, the CPU may predict branch outcome dynamically (advanced CPU design) Chapter 1 — Computer Abstractions and Technology — 36
Control Hazards • This MIPS pipeline always predicts Not-Taken • Easy prediction: The next PC is current PC plus 4 • No need to design complex branch prediction unit • More Taken than Not-Taken in most programs • What happens if the branch is wrong? • Will have mis-fetched instructions • Flush those instructions before they take effect i.e. Before they write to memory or register • A Taken branch incurs a performance penalty Chapter 1 — Computer Abstractions and Technology — 37
Performance Impact §4.8 Control Hazards • If branch outcome determined in MEM Flush theseinstructions (Set controlvalues to 0) PC Three cycles wasted on a taken branch Chapter 4 — The Processor — 38
Performance Impact • The performance loss is 3 cycles per taken branch • Ifbranch outcome determined in MEM • Move execution of branch to the ID stage! • Only beq and bne are supported in original MIPS • Testing equality and inequality is very fast, do it at the end of ID • Branch target can be calculate in ID • Branch target = PC + extended offset • PC and offset are known in the beginning of ID • At the of ID, the CPU knows the branch outcome and branch target Chapter 1 — Computer Abstractions and Technology — 39
Reducing Branch Delay • Move hardware to determine outcome to ID stage • Target address adder • Register comparator • Example code with branch taken 36: sub $10, $4, $840: beq $1, $3, 744: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7 ...72: lw $4, 50($7) Chapter 4 — The Processor — 40
Example: Branch Taken Chapter 4 — The Processor — 41
Early Branch Outcome • Pipeline changes for early branch outcome • 2ndPC adderand the shifter moved to ID • Comparator added to ID • No zero any more from ALU • CPU flushes one instruction for every taken branch • CPU detects taken branch at ID • The instruction at the IF will be flushed • 1 lost cycles instead of 3 lost cycles per taken branch Chapter 1 — Computer Abstractions and Technology — 42
Pipeline Flushing When CPU detects a taken branch at ID • Update PC with branch target (already have) • Flush the instruction IF stage • Add flush signal to IF/ID pipeline • When flushing, convert the instruction in IF/ID register to 32-bit zeros • 0x00000000 is “and $0, $0, $0”, effectively a nop Chapter 1 — Computer Abstractions and Technology — 43
Example: Branch Taken Note: Branch does nothing in EX, MEM and WB Chapter 4 — The Processor — 44
Pipeline Bubble on Branch • Taken branch incurs a pipeline bubble because of instruction flushing Chapter 4 — The Processor — 45
Data Hazards for Branches Moving branch execution to ID is not so easy • May need another forwarding unit • The forwarding unit has to be in the ID stage • The current forwarding unit, in the EX stage, obviously doesn't work • Need extensions to the hazard detection unit, and more pipeline stalls • Branch uses register values at ID, ALU and load produce register values at EX and MEM Chapter 1 — Computer Abstractions and Technology — 46
IF IF IF IF ID ID ID ID EX EX EX EX MEM MEM MEM MEM WB WB WB WB Data Hazards for Branches • If a comparison register is a destination of 2nd or 3rd preceding ALU instruction add $1, $2, $3 add $4, $5, $6 … beq$1, $4, target • Can resolve using forwarding • From MEM to ID, and from WB to ID Chapter 4 — The Processor — 47
IF IF ID ID EX EX MEM MEM WB WB Data Hazards for Branches • If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction • May need 1 stall cycle • However, beq needs the value at the end of ID lw$1, addr add $4, $5, $6 IF ID beqstalled ID EX MEM WB beq$1, $4, target Chapter 4 — The Processor — 48
IF ID EX MEM WB Data Hazards for Branches • If a comparison register is a destination of immediately preceding load instruction • May need 2 stall cycles • Again, beq needs the value at the end of ID, so it’s possible to reduce stall to one cycle lw$1, addr IF ID beqstalled ID beqstalled ID EX MEM WB beq$1, $0, target Chapter 4 — The Processor — 49
Mini-Project C • In Mini-Project C, implement • The simple MIPS pipeline • Data forwarding and hazard detection • Not-taken branch prediction with pipeline flushing Chapter 1 — Computer Abstractions and Technology — 50