1 / 48

Pipeline Implementation

Pipeline Implementation. Outline. Handle Control Hazard Special cases Suggested Reading 4.5. Control Dependence. Example: loop: subl %edx, %ebx jne targ irmovl $10, %edx jmp loop targ: Halt The jne instruction create a control dependency Which instruction will be executed?.

howe
Download Presentation

Pipeline Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pipeline Implementation

  2. Outline • Handle Control Hazard • Special cases • Suggested Reading 4.5

  3. Control Dependence • Example: loop: subl %edx, %ebx jne targ irmovl $10, %edx jmp loop targ: Halt • The jne instruction create a control dependency • Which instruction will be executed?

  4. Branch Misprediction Example demo-j.ys 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute • Should only execute first 7 instructions

  5. PC PC increment predPC F D valC icode valP ifun rA rB Instruction Memory Instruction Select PC PredictPC Fetch f_PC M_Bch M_valA SelectPC M_icode W_valM W_icode Int f_predPC = [ f_icode in {IJXX, ICALL} : f_valC; 1: f_valP; ];

  6. # demo - j 1 2 3 4 5 6 7 8 9 F D E M W 0x000: xorl % eax ,% eax F D E M W 0x002: jne t # Not taken F D E M W 0x011: t: irmovl $3, % edx # Target F D E M W 0x017: irmovl $4, % ecx # Target+1 F F D D E E M M W W 0x007: irmovl $1, % eax # Fall Through Cycle 5 M M_Bch = 0 M_ valA = 0x007 E E f f valE valE 3 3 dstE dstE = = % % edx edx D D valC valC = = 4 4 dstE dstE = = % % ecx ecx F F f f valC valC 1 1 f f rB rB % % eax eax Branch Misprediction Trace • Incorrectly execute two instructions at branch target

  7. Select PC int f_PC = [ #mispredicted branch. Fetch at incremented PC M_icode == IJXX && !M_Bch : M_valA; #completion of RET instruciton W_icode == IRET : W_valM; #default: Use predicted value of PC 1: F_predPC ];

  8. Return Example demo-ret.ys 0x000: irmovl Stack,%esp # Intialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020: .pos 0x20 0x020: p: nop # procedure 0x021: nop 0x022: nop 0x023: ret 0x024: irmovl $1,%eax # Should not be executed 0x02a: irmovl $2,%ecx # Should not be executed 0x030: irmovl $3,%edx # Should not be executed 0x036: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer • Require lots of nops to avoid data hazards

  9. # demo # demo - - ret ret F F D D E E M M W W 0x023: ret 0x023: ret F F D D E E M M W W 0x024: 0x024: irmovl irmovl $1,% $1,% eax eax # Oops! # Oops! F F D D E E M M W W 0x02a: 0x02a: irmovl irmovl $2,% $2,% ecx ecx # Oops! # Oops! F F D D E E M M W W 0x030: 0x030: irmovl irmovl $3,% $3,% edx edx # Oops! # Oops! F F F F D D D D E E E E M M M M W W W W 0x00e: 0x00e: irmovl irmovl $5,% $5,% esi esi # Return # Return W W W valM valM valM = = = 0x0e 0x0e 0x0e M M M valE valE valE = = = 1 1 1 dstE dstE dstE = = = % % % eax eax eax E E E f f f valE valE valE 2 2 2 dstE dstE dstE = = = % % % ecx ecx ecx D D D valC valC valC = = = 3 3 3 dstE dstE dstE = = = % % % edx edx edx F F F f f f valC valC valC 5 5 5 f f f rB rB rB % % % esi esi esi Incorrect Return Example • Incorrectly execute 3 instructions following ret

  10. Handling Misprediction demo-j2.ys 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: halt 0x00f: t: nop # Target 0x010: nop 0x011: irmovl $3, %edx 0x017: irmovl $4, %ecx 0x01d: irmovl $5, %edx

  11. 1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W F F D D E E M M W W F D E M W F D E M W F F D D E E M M W W F F D D E E M M W W Handling Misprediction #demo-j.ys 0x00: xorl %eax, %eax 0x002: jne target #Not Taken 0x11: t:irmovl $2, %edx #Target bubble 0x017: irmovl $3, %ebx #Target+1 bubble 0x007: irmovl $1, %eax #Fall through 0x00d: nop • Predict branch as taken • Fetch 2 instructions at target • Cancel when mispredicted • Detect branch not-taken in execute stage • On following cycle, replace instructions in execute and decode by bubbles • No side effects have occurred yet

  12. Detecting Mispredicted Branch icode M dstE dstM Bch valE valA e_Bch ALU ALU ALU CC CC fun. ALU ALU A B Execute E dstM srcA srcB icode ifun valC valA valB dstE

  13. 1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W F F D D E E M M W W F D E M W F D E M W F F D D E E M M W W F F D D E E M M W W Control for Misprediction #demo-j.ys 0x00: xorl %eax, %eax 0x002: jne target #Not Taken 0x11: t:irmovl $2, %edx #Target bubble 0x017: irmovl $3, %ebx #Target+1 bubble 0x007: irmovl $1, %eax #Fall through 0x00d: nop

  14. E_ icode W icode valE valM dstE dstM M icode Bch valE valA dstE dstM e_Bch CC CC E_bubble Pipe E icode ifun valC valA valB dstE dstM srcA srcB control logic srcB srcA D_bubble D icode ifun rA rB valC valP F predPC

  15. Return Example demo-retb.ys 0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer • Previously executed three additional instructions

  16. F D E M W F D E M W F D E M W F D E M W F F D D E E M M W W W W valM valM = = 0x0b 0x0b • • • F F f f valC valC 5 5 f f rB rB % % esi esi Correct Return Example #demo_retb 0x026: ret bubble bubble bubble 0x00b: irmovl $5, %esi #return • As ret passes through pipeline, stall at fetch stage • While in decode, execute, and memory stage • fetch the same instruction after ret 3 times. • Inject bubble into decode stage • Release stall when reach write-back stage

  17. Detecting Return M icode Bch valE valA dstE dstM e_Bch ALU ALU ALU CC CC fun. ALU ALU A B Execute E icode ifun valC valA valB dstE dstM srcA srcB d_ srcA d_ srcB dstE dstM srcA srcB Fwd Sel +Fwd A B Decode M A B Register Register file file E D icode ifun rA rB valC valP

  18. Control for Return # demo-retb F D E M W 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F F D D E E M M W W 0x00b: irmovl esi # Return $5,%

  19. E_ icode D_ icode W icode valE valM dstE dstM M_ icode M icode Bch valE valA dstE dstM Pipe E icode ifun valC valA valB dstE dstM srcA srcB control logic D_bubble D icode ifun rA rB valC valP F_stall F predPC

  20. Control Cases • Detection • Action

  21. E_ icode d_ srcB d_ srcA D_ icode W icode valE valM dstE dstM M_ icode M icode Bch valE valA dstE dstM e_Bch CC CC E_ dstM E_bubble Pipe E icode ifun valC valA valB dstE dstM srcA srcB control logic srcB srcA D_bubble D icode ifun rA rB valC valP D_stall F_stall F predPC

  22. Implementing Pipeline Control • Combinational logic generates pipeline control signals in the end of current cycle • Action occurs at start of following cycle

  23. Initial Version of Pipeline Control bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool D_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };

  24. Initial Version of Pipeline Control bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool E_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

  25. 1 1 1 2 2 2 3 3 3 Load/use Mispredict Mispredict ret ret ret ret ret ret ret ret ret ret ret ret M M M M M M M M M M M M Load bubble bubble bubble JXX JXX ret ret ret E E E E E E E E E E E E Use bubble bubble bubble bubble bubble bubble ret ret ret D D D D D D D D D D D D Combination A Combination B Control Combinations • Special cases that can arise on same clock cycle • Combination A • Not-taken branch • ret instruction at branch target • Combination B • Instruction that reads from memory to %esp • Followed by ret instruction

  26. 1 ret M E ret D 1 1 Mispredict Mispredict ret ret M M M M JXX JXX E E E E ret ret D D D D Combination A Control Combination A PC Instruction PC PC memory increment memory f_PC Fetch M_ valA Select PC W_ valM F predPC

  27. Control Combination A • Should handle as mispredicted branch • Stalls F pipeline register • But PC selection logic will be using M_valA anyhow

  28. Control Combination B Load/use ret • Would attempt to bubble and stall pipeline register D • Signaled by processor as pipeline error M M M M Load E E E E ret Use ret ret D D D D Combination B

  29. Handling Control Combination B • Load/use hazard should get priority • ret instruction should be held in decode stage for additional cycle

  30. Corrected Pipeline Control Logic bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });

  31. Pipeline Summary • Data Hazards • Most handled by forwarding • No performance penalty • Load/use hazard requires one cycle stall • Control Hazards • Cancel instructions when detect mispredicted branch • Two clock cycles wasted • Stall fetch stage while ret passes through pipeline • Three clock cycles wasted

  32. Pipeline Summary • Control Combinations • Must analyze carefully • First version had subtle bug • Only arises with unusual instruction combination

  33. Exception Handling

  34. Exception • Condition: Instruction encounter an error condition • Deal flow: • Break the program flow • Invoke the exception handler provided by OS. • (Maybe) continue the program flow • E.g. page fault exception

  35. Exceptions • Conditions under which pipeline cannot continue normal operation • Causes • Halt instruction (Current) • Bad address for instruction or data (Previous) • Invalid instruction (Previous) • Pipeline control error (Previous) • Desired Action • Complete some instructions • Either current or previous (depends on exception type) • Discard others • Call exception handler • Like an unexpected procedure call

  36. Exception Examples • Detect in Fetch Stage jmp $-1 # Invalid jump target .byte 0xFF # Invalid instruction code halt # Halt instruction • Detect in Memory Stage irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address

  37. 5 W 1 2 3 4 M F D E M E F D E D F D F Exception detected Exceptions in Pipeline Processor #1 # demo-exc1.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # Invalid address nop .byte 0xFF # Invalid instruction code • Desired Behavior • rmmovl should cause exception Exception detected 0x000: irmovl $100,%eax 0x006: rmmovl %eax,0x10000(%eax) 0x00c: nop 0x00d: .byte 0xFF

  38. 4 5 6 7 8 9 1 2 3 M W 0x000: xorl %eax,%eax F D E E M 0x002: jne t F D D E M W 0x014: t: .byte 0xFF F F D E M W 0x???: (I’m lost!) F D E M W 0x007: irmovl $1,%eax Exceptions in Pipeline Processor #2 # demo-exc2.ys 0x000: xorl %eax,%eax # Set condition codes 0x002: jne t # Not taken 0x007: irmovl $1,%eax 0x00d: irmovl $2,%edx 0x013: halt 0x014: t: .byte 0xFF # Target • Desired Behavior • No exception should occur Exception detected

  39. W stat icode valE valM dstE dstM M stat icode Bch valE valA dstE dstM E stat icode ifun valC valA valB dstE dstM srcA srcB D stat icode ifun rA rB valC valP F predPC Maintaining Exception Ordering • Add exception status field to pipeline registers • Fetch stage sets to either “AOK”, “ADR” (when bad fetch address), or “INS” (illegal instruction) • Decode & execute pass values through • Memory either passes through or sets to “ADR” • Exception triggered only when instruction hits write back

  40. 1 2 3 4 5 F D E M W 0x000: irmovl $100,%eax F D E M 0x006: rmmovl %eax,0x10000(%eax) F D E 0x00c: addl %eax,%eax Side Effects in Pipeline Processor Exception detected # demo-exc3.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address addl %eax,%eax # Sets condition codes • Desired Behavior • rmmovl should cause exception • No following instruction should have any effect Condition code set

  41. Avoiding Side Effects • Presence of Exception Should Disable State Update • When detect exception in memory stage • Disable condition code setting in execute • Must happen in same clock cycle • When exception passes to write-back stage • Disable memory write in memory stage • Disable condition code setting in execute stage • Implementation • Hardwired into the design of the PIPE simulator • You have no control over this

  42. Rest of Exception Handling • Calling Exception Handler • Push PC onto stack • Either PC of faulting instruction or of next instruction • Usually pass through pipeline along with exception status • Jump to handler address • Usually fixed address • Defined as part of ISA • Implementation • Haven’t tried it yet!

  43. Processor Summary • Design Technique • Create uniform framework for all instructions • Want to share hardware among instructions • Connect standard logic blocks with bits of control logic • Operation • State held in memories and clocked registers • Computation done by combinational logic • Clocking of registers/memories sufficient to control overall behavior • Enhancing Performance • Pipelining increases throughput and improves resource utilization • Must make sure maintains ISA behavior

  44. Performance Metrics • Clock rate • Measured in Megahertz or Gigahertz • Function of stage partitioning and circuit design • Keep amount of work per stage small and even • Rate at which instructions executed • CPI: cycles per instruction • On average, how many clock cycles does each instruction require? • Function of pipeline design and benchmark programs • E.g., how frequently are branches mispredicted?

  45. CPI for PIPE • CPI  1.0 • Fetch instruction each clock cycle • Effectively process new instruction almost every cycle • Although each individual instruction has latency of 5 cycles • CPI > 1.0 • Sometimes must stall or cancel branches

  46. CPI for PIPE • Computing CPI • C clock cycles • I instructions executed to completion • B bubbles injected (C = I + B) CPI = C/I = (I+B)/I = 1.0 + B/I • Factor B/I represents average penalty due to bubbles

  47. CPI for PIPE (Cont.) Typical Values • LP: Penalty due to load/use hazard stalling • Fraction of instructions that are loads 0.25 • Fraction of load instructions requiring stall 0.20 • Number of bubbles injected each time 1  LP = 0.25 * 0.20 * 1 = 0.05 • MP: Penalty due to mispredicted branches • Fraction of instructions that are cond. jumps 0.20 • Fraction of cond. jumps mispredicted 0.40 • Number of bubbles injected each time 2  MP = 0.20 * 0.40 * 2 = 0.16

  48. CPI for PIPE (Cont.) Typical Values • RP: Penalty due to ret instructions • Fraction of instructions that are returns 0.02 • Number of bubbles injected each time 3  RP = 0.02 * 3 = 0.06 • Net effect of penalties 0.05 + 0.16 + 0.06 = 0.27 • CPI = 1.27 (Not bad!) B/I = LP + MP + RP

More Related