1 / 63

Pipeline Hazards

Pipeline Hazards. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University. See P&H Chapter 4.7. Goals for Today. Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection, and solutions (delaying, stalling, forwarding, bypass, etc )

kovit
Download Presentation

Pipeline Hazards

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 4.7

  2. Goals for Today • Data Hazards • Revisit Pipelined Processors • Data dependencies • Problem, detection, and solutions • (delaying, stalling, forwarding, bypass, etc) • Hazard detection unit • Forwarding unit • Next time • Control Hazards What is the next instruction to execute if a branch is taken? Not taken?

  3. MIPS Design Principles • Simplicity favors regularity • 32 bit instructions • Smaller is faster • Small register file • Make the common case fast • Include support for constants • Good design demands good compromises • Support for different type of interpretations/classes

  4. Recall: MIPS instruction formats • All MIPS instructions are 32 bits long, has 3 formats • R-type • I-type • J-type

  5. Recall: MIPS Instruction Types • Arithmetic/Logical • R-type: result and two source registers, shift amount • I-type: 16-bit immediate with sign/zero extension • Memory Access • load/store between registers and memory • word, half-word and byte operations • Control flow • conditional branches: pc-relative addresses • jumps: fixed offsets, register absolute

  6. Recall: MIPS Instruction Types • Arithmetic/Logical • ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU • ADDI, ADDIU, ANDI, ORI, XORI, LUI, SLL, SRL, SLLV, SRLV, SRAV, SLTI, SLTIU • MULT, DIV, MFLO, MTLO, MFHI, MTHI • Memory Access • LW, LH, LB, LHU, LBU, LWL, LWR • SW, SH, SB, SWL, SWR • Control flow • BEQ, BNE, BLEZ, BLTZ, BGEZ, BGTZ • J, JR, JAL, JALR, BEQL, BNEL, BLEZL, BGTZL • Special • LL, SC, SYSCALL, BREAK, SYNC, COPROC

  7. Pipelined Processor memory registerfile alu +4 addr PC din dout control memory computejump/branchtargets newpc extend Decode Execute Fetch Memory WB

  8. Pipelined Processor memory registerfile A alu D D B +4 addr PC din dout B M inst control memory computejump/branchtargets newpc extend imm Write-Back InstructionDecode InstructionFetch Memory ctrl ctrl ctrl Execute IF/ID ID/EX EX/MEM MEM/WB

  9. Example: : Sample Code (Simple) • add r3, r1, r2; nand r6, r4, r5; lw r4, 20(r2); add r5, r2, r5; sw r7, 12(r3);

  10. Example: Sample Code (Simple) • Assume eight-register machine • Run the following code on a pipelined datapath add r3 r1 r2 ; reg3 = reg 1 + reg 2 nandr6 r4 r5 ; reg6 = ~(reg 4 & reg 5) lwr420 (r2) ; reg4 = Mem[reg2+20] add r5 r2 r5 ; reg5 = reg 2 + reg 5 swr7 12(r3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

  11. Time Graphs Clock cycle IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Latency: Throughput: Concurrency: CPI =

  12. + A L U M U X 4 target PC+4 PC+4 0 R0 R1 regA ALU result R2 Register file regB valA M U X PC Inst mem Data mem instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 imm dest extend valB Bits 11-15 Rd dest dest Bits 16-20 M U X Rt Bits 26-31 op op op ID/EX EX/MEM MEM/WB IF/ID

  13. At time 1, Fetch add r3 r1 r2 + A L U M U X 4 0 0 0 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data mem nop 12 R3 0 0 18 R4 7 0 R5 41 R6 M U X data 22 R7 0 dest extend 0 Initial State Bits 11-15 0 0 0 Bits 16-20 M U X 0 Bits 26-31 nop nop nop ID/EX EX/MEM MEM/WB IF/ID

  14. + A L U add 3 1 2 M U X 4 0 4 0 / 4 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data mem / 36 add 3 1 2 12 R3 0 0 18 R4 7 0 / 9 R5 41 R6 M U X data 22 R7 0 dest extend 0 Fetch: add 3 1 2 Bits 11-15 / 3 0 0 0 Bits 16-20 / 2 M U X 0 Bits 26-31 nop / add nop nop ID/EX EX/MEM MEM/WB IF/ID / 2 Time: 1

  15. + A L U nand 6 4 5 add 3 1 2 M U X 4 0 / 4 8 4 / 8 0 R0 0 36 R1 1 0 36 9 R2 Register file 2 36 M U X PC Inst mem Data mem / 18 nand 6 4 5 12 R3 0 0 / 45 18 R4 9 7 9 / 7 R5 41 R6 M U X data 22 R7 3 dest extend 0 / 9 Fetch: nand 6 4 5 Bits 11-15 / 6 3 3 0 0 / 3 Bits 16-20 / 5 M U X 2 Bits 26-31 add / nand nop nop / add ID/EX EX/MEM MEM/WB IF/ID / 3 Time: 2

  16. + A L U lw 4 20(2) nand 6 4 5 add 3 1 2 nand () 18 = 01 0010 7 = 00 0111 ------------------ -3 = 11 1101 M U X 4 4 / 8 12 8 0 R0 0 36 R1 4 0 / 45 / 18 36 9 R2 Register file 5 18 M U X PC Inst mem Data mem lw 4 20(2) 12 R3 45 0 / -3 18 R4 / 7 9 7 7 R5 41 R6 M U X data 22 R7 6 dest extend 9 / 7 Fetch: lw 4 20(2) Bits 11-15 3 3 6 3 0 / 6 / 3 Bits 16-20 2 M U X 5 Bits 26-31 nand add / nand nop / add ID/EX EX/MEM MEM/WB IF/ID / 4 Time: 3

  17. + A L U add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2 M U X 4 8 16 12 0 R0 0 36 R1 2 45 18 9 R2 Register file 4 9 M U X PC Inst mem Data mem add 5 2 5 12 R3 -3 0 18 R4 45 7 7 18 R5 41 R6 M U X data 22 R7 20 dest extend 7 Fetch: add 5 2 5 Bits 11-15 6 3 6 0 6 3 Bits 16-20 5 M U X 4 Bits 26-31 lw nand add ID/EX EX/MEM MEM/WB IF/ID Time: 4

  18. + A L U sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 6 4 5 add 3 1 2 M U X 4 12 20 16 0 R0 0 45 36 R1 2 -3 9 9 R2 Register file 5 9 M U X PC Inst mem Data mem sw 7 12(3) 45 R3 29 0 18 R4 -3 7 7 R5 41 R6 M U X data 22 R7 20 5 dest extend 18 Fetch: sw 7 12(3) Bits 11-15 0 6 3 4 5 4 6 Bits 16-20 4 M U X 5 Bits 26-31 add lw nand ID/EX EX/MEM MEM/WB IF/ID Time: 5

  19. + A L U sw7 12(3) add 5 2 5 lw 4 20(2) nand 6 4 5 M U X 4 16 20 0 R0 0 -3 36 R1 3 29 9 9 R2 Register file 7 45 M U X PC Inst mem Data mem 45 R3 16 99 18 R4 29 7 7 22 R5 -3 R6 M U X data 22 R7 12 dest extend 7 No more instructions Bits 11-15 5 4 6 5 0 5 4 Bits 16-20 5 M U X 7 Bits 26-31 sw add lw ID/EX EX/MEM MEM/WB IF/ID Time: 6

  20. + A L U nopnopsw 7 12(3) add 5 2 5 lw 4 20(2) M U X 4 20 0 R0 0 36 R1 16 45 9 R2 Register file M U X PC Inst mem Data mem 45 R3 99 57 0 99 R4 16 7 R5 -3 R6 M U X data 22 R7 12 dest extend 22 No more instructions Bits 11-15 0 5 4 7 7 5 Bits 16-20 7 M U X Bits 26-31 sw add ID/EX EX/MEM MEM/WB IF/ID Time: 7

  21. + A L U nopnopnopsw7 12(3) add 5 2 5 M U X 4 0 R0 16 36 R1 57 9 R2 Register file M U X PC Inst mem Data mem 45 R3 0 99 22 R4 57 16 R5 -3 R6 M U X data 22 R7 dest 22 extend No more instructions Bits 11-15 5 7 Bits 16-20 M U X Bits 26-31 sw ID/EX EX/MEM MEM/WB IF/ID Time: 8 Slides thanks to Sally McKee

  22. + A L U nopnopnopnopsw 7 12(3) M U X 4 0 R0 36 R1 9 R2 Register file M U X PC Inst mem Data mem 45 R3 99 R4 16 R5 -3 R6 M U X data 22 R7 dest extend No more instructions Bits 11-15 Bits 16-20 M U X Bits 21-23 ID/EX EX/MEM MEM/WB IF/ID Time: 9

  23. Takeaway • Pipelining is a powerful technique to mask latencies and increase throughput • Logically, instructions execute one at a time • Physically, instructions execute in parallel • Instruction level parallelism • Abstraction promotes decoupling • Interface (ISA) vs. implementation (Pipeline)

  24. Next Goal • What about data dependencies (also known as a data hazard in a pipelined processor)? • i.e. add r3, r1, r2 • sub r5, r3, r4

  25. Data Hazards • Data Hazards • register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB) • next instructions may read values about to be written • destination reg of earlier instruction == source reg of current stage left “earlier” = started earlier = stage right

  26. Data Hazards IF IF IF IF IF Clock cycle 1 2 3 4 5 6 7 8 9 time MEM WB ID add r3, r1, r2 subr5, r3, r4 MEM WB ID lw r6, 4(r3) MEM WB ID orr5, r3, r5 MEM WB ID swr6, 12(r3) MEM WB ID

  27. Data Hazards • Data Hazards • register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB) • next instructions may read values about to be written • How to detect? • destination reg of earlier instruction == source reg of current stage left “earlier” = started earlier = stage right

  28. Detecting Data Hazards add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8 instmem A A Rd D D D B B inst addr Rb Ra B M dout din imm +4 mem detecthazard PC+4 PC+4 PC Rd Rd Rd Rt OP OP OP ID/EX EX/MEM MEM/WB IF/ID

  29. Takeaway • Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.

  30. Next Goal • What to do if data hazard detected?

  31. Stalling • How to stall an instruction in ID stage • prevent IF/ID pipeline register update • stalls the ID stage instruction • convert ID stage instr into nop for later stages • innocuous “bubble” passes through pipeline • prevent PC update • stalls the next (IF stage) instruction

  32. Detecting Data Hazards add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8 instmem A A Rd D D D B B inst addr Rb Ra B M dout din imm +4 mem detecthazard PC+4 PC+4 PC Rd Rd Rd Rt If detect hazard OP OP OP MemWr=0 RegWr=0 WE=0 ID/EX EX/MEM MEM/WB IF/ID

  33. Stalling Clock cycle time

  34. Stalling Clock cycle time r3 = 10 r3 = 20

  35. Stalling A A D D DrD instmem B B datamem rArB inst M B Rd +4 Rd Rd (MemWr=0 RegWr=0) WE PC WE WE nop Op Op Op sub r5,r3,r5 add r3,r1,r2 (WE=0) or r6,r3,r4 /stall NOP = If(IF/ID.rA ≠ 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd))

  36. Stalling A A D D DrD instmem B B datamem rArB inst M B Rd +4 Rd Rd (MemWr=0 RegWr=0) WE PC WE WE nop (MemWr=0 RegWr=0) Op Op Op nop sub r5,r3,r5 add r3,r1,r2 (WE=0) or r6,r3,r4 /stall NOP = If(IF/ID.rA ≠ 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd))

  37. Stalling A A D D DrD instmem B B datamem rArB inst M B Rd +4 Rd Rd (MemWr=0 RegWr=0) WE PC WE WE nop (MemWr=0 RegWr=0) (MemWr=0 RegWr=0) Op Op Op nop nop sub r5,r3,r5 add r3,r1,r2 (WE=0) or r6,r3,r4 /stall NOP = If(IF/ID.rA ≠ 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd))

  38. Stalling • How to stall an instruction in ID stage • prevent IF/ID pipeline register update • stalls the ID stage instruction • convert ID stage instr into nop for later stages • innocuous “bubble” passes through pipeline • prevent PC update • stalls the next (IF stage) instruction

  39. Takeaway • Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards. • Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards. Stalling introduces NOPs (“bubbles”) into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. Bubbles in pipeline significantly decrease performance.

  40. Next Goal: Resolving Data Hazards via Forwarding • What to do if data hazard detected? • Wait/Stall • Reorder in Software (SW) • Forward/Bypass

  41. Forwarding • Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register). • Three types of forwarding/bypass • Forwarding from Ex/Mem registers to Ex stage (MEx) • Forwarding from Mem/WB register to Ex stage (WEx) • RegisterFile Bypass

  42. Forwarding Datapath A D A B D D instmem B datamem M B imm Rd Rd detecthazard Rb forwardunit WE WE Ra MC MC IF/ID ID/Ex Ex/Mem Mem/WB • Three types of forwarding/bypass • Forwarding from Ex/Mem registers to Ex stage (MEx) • Forwarding from Mem/WB register to Ex stage (W  Ex) • RegisterFileBypass

  43. Forwarding Datapath A D A B D D instmem B datamem M B imm Rd Rd detecthazard Rb forwardunit WE WE Ra MC MC IF/ID ID/Ex Ex/Mem Mem/WB • Three types of forwarding/bypass • Forwarding from Ex/Mem registers to Ex stage (MEx) • Forwarding from Mem/WB register to Ex stage (W  Ex) • RegisterFileBypass

  44. Forwarding Datapath 1 • Ex/MEM to EX Bypass • EX needs ALU result that is still in MEM stage • Resolve: Add a bypass from EX/MEM.D to start of EX • How to detect? Logic in Ex Stage: • forward = (Ex/M.WE && EX/M.Rd != 0 && • ID/Ex.Ra == Ex/M.Rd) • || (same for Rb) • destination reg of earlier instruction == source reg of current stage left “earlier” = started earlier = stage right

  45. Forwarding Datapath 1 A D B instmem datamem IF ID Ex M W IF ID Ex M W

  46. Forwarding Datapath 2 • Mem/WB to EX Bypass • EX needs value being written by WB • Resolve: Add bypass from WB final value to start of EX • How to detect? Logic in Ex Stage: • forward = (M/WB.WE && M/WB.Rd != 0 && • ID/Ex.Ra == M/WB.Rd&& • not (ID/Ex.WE && Ex/M.Rd != 0 && • ID/Ex.Ra == Ex/M.Rd) • || (same for Rb) • destination reg of earlier instruction == source reg of current stage left “earlier” = started earlier = stage right Check pg. 369

  47. Forwarding Datapath 2 A D B instmem datamem IF ID Ex M W IF ID Ex M W IF ID Ex M W

  48. Register File Bypass • Register File Bypass • Reading a value that is currently being written • Detect: • ((Ra == MEM/WB.Rd) or (Rb == MEM/WB.Rd)) and (WB is writing a register) • Resolve: • Add a bypass around register file (WB to ID) • Better: (Hack) just negate register file clock • writes happen at end of first half of each clock cycle • reads happen during second half of each clock cycle

  49. Register File Bypass A D B instmem datamem IF ID Ex M W IF ID Ex M W IF ID Ex M W IF ID Ex M W

  50. Forwarding Example Clock cycle time r3 = 10 r3 = 20

More Related