480 likes | 593 Views
Pipeline Implementation. Outline. Handle Control Hazard Special cases Suggested Reading 4.5. Control Dependence. Example: loop: subl %edx, %ebx jne targ irmovl $10, %edx jmp loop targ: Halt The jne instruction create a control dependency Which instruction will be executed?.
E N D
Outline • Handle Control Hazard • Special cases • Suggested Reading 4.5
Control Dependence • Example: loop: subl %edx, %ebx jne targ irmovl $10, %edx jmp loop targ: Halt • The jne instruction create a control dependency • Which instruction will be executed?
Branch Misprediction Example demo-j.ys 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute • Should only execute first 7 instructions
PC PC increment predPC F D valC icode valP ifun rA rB Instruction Memory Instruction Select PC PredictPC Fetch f_PC M_Bch M_valA SelectPC M_icode W_valM W_icode Int f_predPC = [ f_icode in {IJXX, ICALL} : f_valC; 1: f_valP; ];
# demo - j 1 2 3 4 5 6 7 8 9 F D E M W 0x000: xorl % eax ,% eax F D E M W 0x002: jne t # Not taken F D E M W 0x011: t: irmovl $3, % edx # Target F D E M W 0x017: irmovl $4, % ecx # Target+1 F F D D E E M M W W 0x007: irmovl $1, % eax # Fall Through Cycle 5 M M_Bch = 0 M_ valA = 0x007 E E f f valE valE 3 3 dstE dstE = = % % edx edx D D valC valC = = 4 4 dstE dstE = = % % ecx ecx F F f f valC valC 1 1 f f rB rB % % eax eax Branch Misprediction Trace • Incorrectly execute two instructions at branch target
Select PC int f_PC = [ #mispredicted branch. Fetch at incremented PC M_icode == IJXX && !M_Bch : M_valA; #completion of RET instruciton W_icode == IRET : W_valM; #default: Use predicted value of PC 1: F_predPC ];
Return Example demo-ret.ys 0x000: irmovl Stack,%esp # Intialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020: .pos 0x20 0x020: p: nop # procedure 0x021: nop 0x022: nop 0x023: ret 0x024: irmovl $1,%eax # Should not be executed 0x02a: irmovl $2,%ecx # Should not be executed 0x030: irmovl $3,%edx # Should not be executed 0x036: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer • Require lots of nops to avoid data hazards
# demo # demo - - ret ret F F D D E E M M W W 0x023: ret 0x023: ret F F D D E E M M W W 0x024: 0x024: irmovl irmovl $1,% $1,% eax eax # Oops! # Oops! F F D D E E M M W W 0x02a: 0x02a: irmovl irmovl $2,% $2,% ecx ecx # Oops! # Oops! F F D D E E M M W W 0x030: 0x030: irmovl irmovl $3,% $3,% edx edx # Oops! # Oops! F F F F D D D D E E E E M M M M W W W W 0x00e: 0x00e: irmovl irmovl $5,% $5,% esi esi # Return # Return W W W valM valM valM = = = 0x0e 0x0e 0x0e M M M valE valE valE = = = 1 1 1 dstE dstE dstE = = = % % % eax eax eax E E E f f f valE valE valE 2 2 2 dstE dstE dstE = = = % % % ecx ecx ecx D D D valC valC valC = = = 3 3 3 dstE dstE dstE = = = % % % edx edx edx F F F f f f valC valC valC 5 5 5 f f f rB rB rB % % % esi esi esi Incorrect Return Example • Incorrectly execute 3 instructions following ret
Handling Misprediction demo-j2.ys 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: halt 0x00f: t: nop # Target 0x010: nop 0x011: irmovl $3, %edx 0x017: irmovl $4, %ecx 0x01d: irmovl $5, %edx
1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W F F D D E E M M W W F D E M W F D E M W F F D D E E M M W W F F D D E E M M W W Handling Misprediction #demo-j.ys 0x00: xorl %eax, %eax 0x002: jne target #Not Taken 0x11: t:irmovl $2, %edx #Target bubble 0x017: irmovl $3, %ebx #Target+1 bubble 0x007: irmovl $1, %eax #Fall through 0x00d: nop • Predict branch as taken • Fetch 2 instructions at target • Cancel when mispredicted • Detect branch not-taken in execute stage • On following cycle, replace instructions in execute and decode by bubbles • No side effects have occurred yet
Detecting Mispredicted Branch icode M dstE dstM Bch valE valA e_Bch ALU ALU ALU CC CC fun. ALU ALU A B Execute E dstM srcA srcB icode ifun valC valA valB dstE
1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W F F D D E E M M W W F D E M W F D E M W F F D D E E M M W W F F D D E E M M W W Control for Misprediction #demo-j.ys 0x00: xorl %eax, %eax 0x002: jne target #Not Taken 0x11: t:irmovl $2, %edx #Target bubble 0x017: irmovl $3, %ebx #Target+1 bubble 0x007: irmovl $1, %eax #Fall through 0x00d: nop
E_ icode W icode valE valM dstE dstM M icode Bch valE valA dstE dstM e_Bch CC CC E_bubble Pipe E icode ifun valC valA valB dstE dstM srcA srcB control logic srcB srcA D_bubble D icode ifun rA rB valC valP F predPC
Return Example demo-retb.ys 0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer • Previously executed three additional instructions
F D E M W F D E M W F D E M W F D E M W F F D D E E M M W W W W valM valM = = 0x0b 0x0b • • • F F f f valC valC 5 5 f f rB rB % % esi esi Correct Return Example #demo_retb 0x026: ret bubble bubble bubble 0x00b: irmovl $5, %esi #return • As ret passes through pipeline, stall at fetch stage • While in decode, execute, and memory stage • fetch the same instruction after ret 3 times. • Inject bubble into decode stage • Release stall when reach write-back stage
Detecting Return M icode Bch valE valA dstE dstM e_Bch ALU ALU ALU CC CC fun. ALU ALU A B Execute E icode ifun valC valA valB dstE dstM srcA srcB d_ srcA d_ srcB dstE dstM srcA srcB Fwd Sel +Fwd A B Decode M A B Register Register file file E D icode ifun rA rB valC valP
Control for Return # demo-retb F D E M W 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F F D D E E M M W W 0x00b: irmovl esi # Return $5,%
E_ icode D_ icode W icode valE valM dstE dstM M_ icode M icode Bch valE valA dstE dstM Pipe E icode ifun valC valA valB dstE dstM srcA srcB control logic D_bubble D icode ifun rA rB valC valP F_stall F predPC
Control Cases • Detection • Action
E_ icode d_ srcB d_ srcA D_ icode W icode valE valM dstE dstM M_ icode M icode Bch valE valA dstE dstM e_Bch CC CC E_ dstM E_bubble Pipe E icode ifun valC valA valB dstE dstM srcA srcB control logic srcB srcA D_bubble D icode ifun rA rB valC valP D_stall F_stall F predPC
Implementing Pipeline Control • Combinational logic generates pipeline control signals in the end of current cycle • Action occurs at start of following cycle
Initial Version of Pipeline Control bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool D_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
Initial Version of Pipeline Control bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool E_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};
1 1 1 2 2 2 3 3 3 Load/use Mispredict Mispredict ret ret ret ret ret ret ret ret ret ret ret ret M M M M M M M M M M M M Load bubble bubble bubble JXX JXX ret ret ret E E E E E E E E E E E E Use bubble bubble bubble bubble bubble bubble ret ret ret D D D D D D D D D D D D Combination A Combination B Control Combinations • Special cases that can arise on same clock cycle • Combination A • Not-taken branch • ret instruction at branch target • Combination B • Instruction that reads from memory to %esp • Followed by ret instruction
1 ret M E ret D 1 1 Mispredict Mispredict ret ret M M M M JXX JXX E E E E ret ret D D D D Combination A Control Combination A PC Instruction PC PC memory increment memory f_PC Fetch M_ valA Select PC W_ valM F predPC
Control Combination A • Should handle as mispredicted branch • Stalls F pipeline register • But PC selection logic will be using M_valA anyhow
Control Combination B Load/use ret • Would attempt to bubble and stall pipeline register D • Signaled by processor as pipeline error M M M M Load E E E E ret Use ret ret D D D D Combination B
Handling Control Combination B • Load/use hazard should get priority • ret instruction should be held in decode stage for additional cycle
Corrected Pipeline Control Logic bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });
Pipeline Summary • Data Hazards • Most handled by forwarding • No performance penalty • Load/use hazard requires one cycle stall • Control Hazards • Cancel instructions when detect mispredicted branch • Two clock cycles wasted • Stall fetch stage while ret passes through pipeline • Three clock cycles wasted
Pipeline Summary • Control Combinations • Must analyze carefully • First version had subtle bug • Only arises with unusual instruction combination
Exception • Condition: Instruction encounter an error condition • Deal flow: • Break the program flow • Invoke the exception handler provided by OS. • (Maybe) continue the program flow • E.g. page fault exception
Exceptions • Conditions under which pipeline cannot continue normal operation • Causes • Halt instruction (Current) • Bad address for instruction or data (Previous) • Invalid instruction (Previous) • Pipeline control error (Previous) • Desired Action • Complete some instructions • Either current or previous (depends on exception type) • Discard others • Call exception handler • Like an unexpected procedure call
Exception Examples • Detect in Fetch Stage jmp $-1 # Invalid jump target .byte 0xFF # Invalid instruction code halt # Halt instruction • Detect in Memory Stage irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address
5 W 1 2 3 4 M F D E M E F D E D F D F Exception detected Exceptions in Pipeline Processor #1 # demo-exc1.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # Invalid address nop .byte 0xFF # Invalid instruction code • Desired Behavior • rmmovl should cause exception Exception detected 0x000: irmovl $100,%eax 0x006: rmmovl %eax,0x10000(%eax) 0x00c: nop 0x00d: .byte 0xFF
4 5 6 7 8 9 1 2 3 M W 0x000: xorl %eax,%eax F D E E M 0x002: jne t F D D E M W 0x014: t: .byte 0xFF F F D E M W 0x???: (I’m lost!) F D E M W 0x007: irmovl $1,%eax Exceptions in Pipeline Processor #2 # demo-exc2.ys 0x000: xorl %eax,%eax # Set condition codes 0x002: jne t # Not taken 0x007: irmovl $1,%eax 0x00d: irmovl $2,%edx 0x013: halt 0x014: t: .byte 0xFF # Target • Desired Behavior • No exception should occur Exception detected
W stat icode valE valM dstE dstM M stat icode Bch valE valA dstE dstM E stat icode ifun valC valA valB dstE dstM srcA srcB D stat icode ifun rA rB valC valP F predPC Maintaining Exception Ordering • Add exception status field to pipeline registers • Fetch stage sets to either “AOK”, “ADR” (when bad fetch address), or “INS” (illegal instruction) • Decode & execute pass values through • Memory either passes through or sets to “ADR” • Exception triggered only when instruction hits write back
1 2 3 4 5 F D E M W 0x000: irmovl $100,%eax F D E M 0x006: rmmovl %eax,0x10000(%eax) F D E 0x00c: addl %eax,%eax Side Effects in Pipeline Processor Exception detected # demo-exc3.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address addl %eax,%eax # Sets condition codes • Desired Behavior • rmmovl should cause exception • No following instruction should have any effect Condition code set
Avoiding Side Effects • Presence of Exception Should Disable State Update • When detect exception in memory stage • Disable condition code setting in execute • Must happen in same clock cycle • When exception passes to write-back stage • Disable memory write in memory stage • Disable condition code setting in execute stage • Implementation • Hardwired into the design of the PIPE simulator • You have no control over this
Rest of Exception Handling • Calling Exception Handler • Push PC onto stack • Either PC of faulting instruction or of next instruction • Usually pass through pipeline along with exception status • Jump to handler address • Usually fixed address • Defined as part of ISA • Implementation • Haven’t tried it yet!
Processor Summary • Design Technique • Create uniform framework for all instructions • Want to share hardware among instructions • Connect standard logic blocks with bits of control logic • Operation • State held in memories and clocked registers • Computation done by combinational logic • Clocking of registers/memories sufficient to control overall behavior • Enhancing Performance • Pipelining increases throughput and improves resource utilization • Must make sure maintains ISA behavior
Performance Metrics • Clock rate • Measured in Megahertz or Gigahertz • Function of stage partitioning and circuit design • Keep amount of work per stage small and even • Rate at which instructions executed • CPI: cycles per instruction • On average, how many clock cycles does each instruction require? • Function of pipeline design and benchmark programs • E.g., how frequently are branches mispredicted?
CPI for PIPE • CPI 1.0 • Fetch instruction each clock cycle • Effectively process new instruction almost every cycle • Although each individual instruction has latency of 5 cycles • CPI > 1.0 • Sometimes must stall or cancel branches
CPI for PIPE • Computing CPI • C clock cycles • I instructions executed to completion • B bubbles injected (C = I + B) CPI = C/I = (I+B)/I = 1.0 + B/I • Factor B/I represents average penalty due to bubbles
CPI for PIPE (Cont.) Typical Values • LP: Penalty due to load/use hazard stalling • Fraction of instructions that are loads 0.25 • Fraction of load instructions requiring stall 0.20 • Number of bubbles injected each time 1 LP = 0.25 * 0.20 * 1 = 0.05 • MP: Penalty due to mispredicted branches • Fraction of instructions that are cond. jumps 0.20 • Fraction of cond. jumps mispredicted 0.40 • Number of bubbles injected each time 2 MP = 0.20 * 0.40 * 2 = 0.16
CPI for PIPE (Cont.) Typical Values • RP: Penalty due to ret instructions • Fraction of instructions that are returns 0.02 • Number of bubbles injected each time 3 RP = 0.02 * 3 = 0.06 • Net effect of penalties 0.05 + 0.16 + 0.06 = 0.27 • CPI = 1.27 (Not bad!) B/I = LP + MP + RP