470 likes | 729 Views
EECS 470. Pipeline Hazards Lecture 4 Coverage: Appendix A. Basic Pipelining. Data hazards What are they? How do you detect them? How do you deal with them? Micro-architectural changes Pipeline depth Pipeline width Forwarding ISA. +. +. A L U.
E N D
EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A
Basic Pipelining • Data hazards • What are they? • How do you detect them? • How do you deal with them? • Micro-architectural changes • Pipeline depth • Pipeline width • Forwarding ISA
+ + A L U Fetch Decode Execute Memory WB M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 16-18 M U X Bits 22-24 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U Fetch Decode Execute Memory WB M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 M U X valB R5 R6 M U X data R7 offset dest valB dest dest dest op op op IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U fwd fwd fwd Fetch Decode Execute Memory WB M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 M U X valB R5 data R6 M U X R7 offset valB op op op IF/ ID ID/ EX EX/ Mem Mem/ WB
Pipeline function for ADD • Fetch: read instruction from memory • Decode: read source operands from reg • Execute: calculate sum • Memory: Pass results to next stage • Writeback: write sum into register file
Data Hazards add 1 2 3 nand 3 4 5 time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, you will read the wrong value of R3
Three approaches to handling data hazards • Avoidance • Make sure there are no hazards in the code • Detect and Stall • If hazards exist, stall the processor until they go away. • Detect and Forward • If hazards exist, fix up the pipeline to get the correct value (if possible)
Handling data hazards: avoid all hazards • Assume the programmer (or the compiler) knows about the processor implementation. • Make sure no hazards exist. • Put noops between any dependent instructions. write R3 in cycle 5 add 1 2 3 noop noop nand 3 4 5 read R3in cycle 6
Problems with this solution • Old programs (legacy code) may not run correctly on new implementations • Longer pipelines need more noops • Programs get larger as noops are included • Especially a problem for machines that try to execute more than one instruction every cycle • Intel EPIC: Often 25% - 40% of instructions are noops • Program execution is slower • CPI is one, but some I’s are noops
Handling data hazards: detect and stall • Detection: • Compare regA with previous DestRegs • 3 bit operand fields • Compare regB with previous DestRegs • 3 bit operand fields • Stall: • Keep current instructions in fetch and decode • Pass a noop to execute
+ + A L U End of Cycle 1 M U X 1 target PC+1 PC+1 0 R0 eq? 14 R1 regA ALU result 7 R2 Inst mem Register file regB valA M U X PC Data memory 10 R3 add 1 2 3 ALU result mdata R4 M U X valB R5 data R6 M U X R7 offset valB op op op IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U End of Cycle 2 M U X 1 target PC+1 PC+1 0 R0 eq? 14 R1 regA ALU result 7 R2 Inst mem Register file regB 14 M U X PC Data memory 10 R3 nand 3 4 5 3 ALU result mdata R4 M U X 7 R5 data R6 M U X R7 3 valB add op op IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + Hazard detection A L U First half of cycle 3 M U X 1 target PC+1 PC+1 0 R0 eq? 3 14 R1 regA ALU result 7 R2 Inst mem Register file regB 14 M U X PC Data memory nand 3 4 5 10 R3 3 ALU result mdata R4 M U X 7 R5 data R6 M U X R7 3 valB add op op IF/ ID ID/ EX EX/ Mem Mem/ WB
compare compare compare Hazard detected compare REG file regA 3 regB 3 IF/ ID ID/ EX
1 Hazard detected compare 0 0 0 0 1 1 regA regB 0 1 1 3
Handling data hazards: detect and stall the pipeline until ready • Detection: • Compare regA with previous DestReg • 3 bit operand fields • Compare regB with previous DestReg • 3 bit operand fields • Stall: Keep current instructions in fetch and decode Pass a noop to execute
en + + Hazard en A L U First half of cycle 3 M U X 1 target 2 1 0 R0 eq? 3 14 R1 regA ALU result 7 R2 Inst mem Register file regB 14 M U X PC Data memory nand 3 4 5 10 R3 3 ALU result mdata 11 R4 M U X 7 R5 data R6 M U X R7 valB add IF/ ID ID/ EX EX/ Mem Mem/ WB
Handling data hazards: detect and stall the pipeline until ready • Detection: • Compare regA with previous DestReg • 3 bit operand fields • Compare regB with previous DestReg • 3 bit operand fields • Stall: • Keep current instructions in fetch and decode • Pass a noop to execute
+ + A L U noop End of cycle 3 M U X 1 2 0 R0 14 R1 regA ALU result 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 21 mdata 3 11 R4 M U X R5 data R6 M U X R7 add IF/ ID ID/ EX EX/ Mem Mem/ WB
en + + Hazard en A L U First half of cycle 4 M U X 1 2 0 R0 3 14 R1 regA ALU result 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 21 mdata 3 11 R4 M U X R5 data R6 M U X R7 noop add IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U noop End of cycle 4 M U X 1 2 0 R0 14 R1 regA 21 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 3 11 R4 M U X R5 data R6 M U X R7 noop noop add IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + No Hazard A L U First half of cycle 5 M U X 1 2 0 R0 3 14 R1 regA 21 7 R2 Inst mem Register file regB M U X PC Data memory nand 3 4 5 10 R3 3 11 R4 M U X R5 data R6 M U X R7 noop noop add IF/ ID ID/ EX EX/ Mem Mem/ WB
End of cycle 5 + + A L U M U X 1 3 2 0 R0 14 R1 regA 7 R2 Inst mem Register file regB 21 M U X PC Data memory add 3 7 7 21 R3 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB
No more hazard: stalling add 1 2 3 nand 3 4 5 time add fetch decode execute memory writeback nand fetch decodedecodedecodeexecute hazard hazard We are careful to get the right value of R3
Problems with detect and stall • CPI increases every time a hazard is detected! • Is that necessary? Not always! • Re-route the result of the add to the nand • nand no longer needs to read R3 from reg file • It can get the data later (when it is ready) • This lets us complete the decode this cycle • But we need more control to remember that the data that we aren’t getting from the reg file at this time will be found elsewhere in the pipeline at a later cycle.
Handling data hazards: detect and forward • Detection: same as detect and stall • Except that all 4 hazards are treated differently • i.e., you can’t logical-OR the 4 hazard signals • Forward: • New datapaths to route computed data to where it is needed • New Mux and control to pick the right data
First half of cycle 3 + + Hazard A L U fwd fwd fwd M U X 1 2 1 0 R0 3 14 R1 regA 7 R2 Inst mem Register file regB 14 M U X PC Data memory nand 3 4 5 10 R3 3 11 R4 M U X 77 7 R5 data 1 R6 M U X 8 R7 add IF/ ID ID/ EX EX/ Mem Mem/ WB
End of cycle 3 + + A L U H1 M U X 1 3 2 0 R0 14 R1 regA 7 R2 Inst mem Register file regB 10 M U X PC Data memory add 6 3 7 10 R3 3 21 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB
First half of cycle 4 + + New Hazard A L U H1 M U X 1 3 2 0 R0 21 14 R1 regA M U X 3 7 R2 Inst mem Register file regB 10 M U X PC Data memory add 6 3 7 10 R3 3 21 11 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB
End of cycle 4 + + A L U H2 H1 M U X 1 4 3 0 R0 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory lw 3 6 10 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB
First half of cycle 5 + + 1 21 A L U H2 H1 M U X 1 4 3 No Hazard 0 R0 3 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory lw 3 6 10 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U H2 H1 End of cycle 5 M U X 1 5 4 0 R0 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data memory sw 6 2 12 21 R3 6 22 11 R4 7 5 M U X 77 R5 data 1 R6 M U X 8 R7 10 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB
en + + en A L U H2 H1 First half of cycle 6 M U X 1 5 4 Hazard 0 R0 6 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data memory sw 6 2 12 21 R3 22 11 R4 6 7 5 M U X 77 R5 L 1 R6 M U X data 8 R7 10 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U noop H2 End of cycle 6 M U X 1 5 0 R0 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data memory sw 6 2 12 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 lw add IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U H2 First half of cycle 7 M U X 1 5 Hazard 0 R0 6 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data memory sw 6 2 12 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 noop lw add IF/ ID ID/ EX EX/ Mem Mem/ WB
+ + A L U H3 End of cycle 7 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory 21 R3 99 11 R4 6 M U X -2 7 R5 data 1 R6 M U X 22 R7 12 sw noop lw IF/ ID ID/ EX EX/ Mem Mem/ WB
First half of cycle 8 + + 99 12 A L U H3 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data memory 21 R3 99 11 R4 6 M U X -2 7 R5 data 1 R6 M U X 8 R7 12 sw noop lw IF/ ID ID/ EX EX/ Mem Mem/ WB
End of cycle 8 + + A L U H3 M U X 1 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB M U X PC Data memory 21 R3 111 11 R4 M U X -2 R5 data 99 R6 M U X 8 R7 7 sw noop IF/ ID ID/ EX EX/ Mem Mem/ WB
FP pipeline support I add M1 M2 M3 M4 M5 M6 M7 Mem WB fetch decode FP multiply A1 A2 A3 A4 FP adder Non-pipelined divide
Adding pipeline stages • Pipeline frontend • Fetch, Decode • Pipeline middle • Execute • Pipeline backend • Memory, Writeback
Adding stages to fetch, decode • Delays hazard detection • No change in forwarding paths • No performance penalty with respect to data hazards
Adding stages to execute • Check for structural hazards • ALU not pipelined • Multiple ALU ops completing at same time • Data hazards may cause delays • If multicycle op hasn't computed data before the dependent instruction is ready to execute • Performance penalty for each stall
Adding stages to memory, writeback • Instructions ready to execute may need to wait longer for multi-cycle memory stage • Adds more pipeline registers • Thus more source registers to forward • More complex hazard detection • Wider muxes • More control bits to manage muxes
Wider pipelines fetch decode execute mem WB fetch decode execute mem WB More complex hazard detection 2X pipeline registers to forward from 2X more instructions to check 2X more destinations (muxes)
Making forwarding explicit • add r1 r2, EX/Mem ALU result • Include direct mux controls into the ISA • Hazard detection is now a compiler task • New micro-architecture leads to new ISA • Can reduce some resources • No longer need to build a heavily ported reg file Ref: TTAs: Missing the ILP complexity wall