280 likes | 298 Views
CDA 3101 Spring 2016 Introduction to Computer Organization. Pipeline Control And Pipeline Hazards 15 March 2016. Control Signals. PCSrc. Mux. IF/ID. Add. ID/EX. EX/MEM. Shift left 2. MEM/WB. Branch. RegWrite. 4. ALUSrc. Zero.
E N D
CDA 3101 Spring 2016 Introduction to Computer Organization Pipeline Control And Pipeline Hazards 15 March 2016
Control Signals PCSrc Mux IF/ID Add ID/EX EX/MEM Shift left 2 MEM/WB Branch RegWrite 4 ALUSrc Zero MemtoReg Add MemWrite ALU Regs Mux Mux Instr. Mem Data Mem PC ALU Control Sign extend MemRead rt[20-16] Mux ALUOp rd[15-11] RegDst
Control Implementation • Pipelining leaves the meaning of the 9 control lines unchanged • Set control lines (to defined values) in each stage for each instruction • Extend pipeline registers to include control information • Nothing to control during IF and ID • Create control information during ID
WB Control M WB EX M WB Generation/Propagation of Control Instruction IF/ID ID/EX EX/MEM MEM/WB
PCSrc W B W B Control M W B Mux E M IF/ID Add ID/EX EX/MEM Shift left 2 MEM/WB Branch RegWrite 4 ALUSrc MemtoReg Zero Add MemWrite ALU Regs Mux Mux Instr. Mem Data Mem PC ALU Control Sign extend MemRead ALUOp rt[20-16] Mux rd[15-11] RegDst
Example lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9
Limits to Pipelining • Hazards prevent next instruction from executing during its designated clock cycle • Structural hazards • HW cannot support this combination of instructions • Ex: Single person to fold and put clothes away • Control hazards • Branches stall the pipeline until the hazard “bubbles” in the pipeline • Data hazards • Instruction depends on result of prior instruction • Ex: Missing sock
Pipeline Hazards (Example) 2 AM 12 6 PM 1 8 7 11 10 9 Time 30 30 30 30 30 30 30 A A A D D D T a s k O r d e r Bag A: Control puts 90m bubble in pipeline be-tween dryer and folder (done 9pm) Bag D: Cannot complete until 10:30pm (one folder available) bubble B C E F • Jim’s green socks : one in other in • depends on stallsince folder busy
I$ ALU I$ D$ Reg Reg ALU I$ D$ Reg Reg ALU ALU I$ D$ Reg Reg ALU Structural Hazard 1: Single Memory Time (clock cycles) I n s t r. O r d e r D$ Reg Reg Load Instr 1 Instr 2 I$ D$ Reg Reg Instr 3 Instr 4 IM = DM => Read same memory twice in one clock cycle
I$ ALU I$ D$ Reg Reg ALU I$ D$ Reg Reg ALU ALU I$ D$ Reg Reg ALU Structural Hazard 2: Register File Time (clock cycles) I n s t r. O r d e r D$ Reg Reg Load Instr 1 Instr 2 I$ D$ Reg Reg Instr 3 Instr 4 Try read and write to registers simultaneously
Structural Hazards: Solutions • Structural hazard 1: single memory • Two memories? infeasible and inefficient => Two Level 1 caches (instruction and data) • Structural hazard 2: register file • Register access takes less that ½ ALU stage time => Use the following convention: • Always Write during first half of each cycle • Always Read during second half of each cycle • Both, Read and Write can be performed during the same clock cycle (a small delay between)
Control Hazard: Branch Instr. (1/2) • Branch decision-making hardware in ALU stage • Two more instructions after the branch will always be fetched, whether or not the branch is taken • Desired functionality of a branch • if we do not take the branch, don’t waste any time and continue executing normally • if we take the branch, don’t execute any instructions after the branch, just go to the desired label
Control Hazard: Branch Instr. (2/2) • Initial Solution: Stall until decision is made • Insert “no-op” instructions: those that accomplish nothing, just take time • Drawback: branches take 3 clock cycles each (assuming comparator is put in ALU stage) • Better Solution: Move comparator to Stage 2 • Benefit: since branch is complete in Stage 2, only one unnecessary instruction is fetched • Therefore, only one no-op is needed • This means that branches are idle in Stages 3, 4 and 5.
I$ ALU I$ ALU bubble ALU Control Hazard: Better Sol’n. • Move comparator up to Stage 2 • Benefit: since branch is complete in Stage 2, only one unnecessary instruction is fetched, so only one no-op is needed • This means that branches are idle in Stages 3, 4 and 5. Time (clock cycles) I n s t r. O r d e r D$ Reg Reg Add D$ Reg Reg Beq Load D$ Reg Reg I$
Best: Delayed Branches (1/2) • If we take the branch, none of the instructions after the branch get executed by accident • New definition: whether or not we take the branch, the instruction immediately following the branch gets executed (called the branch-delay slot)
Best: Delayed Branches (2/2) • Notes on Branch-Delay Slot • Worst-Case Scenario: can always use a no-op • Better Case: can find an instruction preceding the branch which can be placed in the branch-delay slot without affecting flow of the program • Re-ordering instructions is a common speedup technique – done in compiler • Compiler must be smart in order to find instructions to do this • Usually can find such an instruction at least 50% of the time - REAL STUFF!!
or $8, $9 ,$10 add $1 ,$2,$3 sub $4, $5,$6 add $1 ,$2,$3 beq $1, $4, Exit sub $4, $5,$6 beq $1, $4, Exit or $8, $9 ,$10 xor $10, $1,$11 xor $10, $1,$11 . . . . . . Exit: Exit: Nondelayed vs. Delayed Nondelayed Branch Delayed Branch
Conclusions (1/2) • Optimal Pipeline • Each stage is executing part of an instruction each cycle. • One instruction finishes during each clock cycle. • On average, execute far more quickly • What makes this work? • Similarities between instructions • Each stage takes about the same amount of time as all others
Conclusions (2/2) • Pipelining a Big Idea: widely used concept • What makes it less than perfect? • Structural hazards: Need more HW resources • Control hazards: Delayed branch • Data hazards: an instruction depends on a previous one • Next Topic:Pipeline Performance Issues • Wednesday: EXAM #2