400 likes | 595 Views
ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2008 Pipelined Control and Performance (Chapter 6). Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal
E N D
ELEC 5200-001/6200-001Computer Architecture and DesignSpring 2008Pipelined Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu ELEC 5200-001/6200-001 Lecture 10
Pipelined Datapath (without Jump) IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 26-31 zero 21-25 Instr mem ALU PC 16-20 Data mem. Reg. File 1 mux 0 0 mux 1 Sign ext. 16-20 for I-type lw 11-15 for R-type 1 mux 0 0-15 ELEC 5200-001/6200-001 Lecture 10
Mem. and Reg. File Need Controls IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 RegWrite 26-31 MemWrite MemRead zero 21-25 Instr mem ALU PC 16-20 Data mem. Reg. File 1 mux 0 0 mux 1 Sign ext. 16-20 for I-type lw 11-15 for R-type 1 mux 0 0-15 ELEC 5200-001/6200-001 Lecture 10
Multiplexers Need Controls IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU Shift left 2 opcode RegWrite PCSrc Branch 26-31 MemWrite MemRead MemtoReg zero ALUSrc 21-25 Instr mem ALU PC 16-20 Data mem. Reg. File 1 mux 0 0 mux 1 Sign ext. 16-20 for I-type lw 11-15 for R-type 1 mux 0 RegDst 0-15 ELEC 5200-001/6200-001 Lecture 10
ALU Needs a Control IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU Shift left 2 opcode RegWrite PCSrc Branch 26-31 MemWrite MemRead MemtoReg zero ALUSrc 21-25 Instr mem PC ALU 16-20 Data mem. Reg. File 1 mux 0 0 mux 1 ALU cont. Sign ext. 0-5 ALUOp 16-20 for I-type lw 11-15 for R-type 1 mux 0 RegDst 0-15 ELEC 5200-001/6200-001 Lecture 10
Compare with Single-Cycle Control • Control signals are the same as those needed for a single-cycle datapath. • Control signals are generated by the Opocode in the ID (instruction decode) cycle and then distributed to other cycles. • Let us reexamine the implementation of the single-cycle control (slides 3-8 of Lecture 8). ELEC 5200-001/6200-001 Lecture 10
Hardwired CU: Single-Cycle • Implemented by combinational logic. Control logic Datapath 6 funct. code Control signals To ALU 6 opcode 3 ALU control ALUOp 2 ELEC 5200-001/6200-001 Lecture 10
Jump 0-25 Shift left2 0 mux 1 4 Add 1 mux 0 ALU Branch opcode MemtoReg CONTROL 26-31 RegWrite ALUSrc 21-25 zero MemWrite MemRead ALU Instr. mem. PC Reg. File Data mem. 1 mux 0 16-20 0 mux 1 1 mux 0 11-15 Single-cycle datapath RegDst ALUOp ALU Cont. Sign ext. Shift left 2 0-15 0-5 ELEC 5200-001/6200-001 Lecture 10
Single-Cycle Control Logic Op5 Op4 Op3 Op2 Op1 Op0 Jump Branch ALUOp1 MemtoReg RegWrite MemRead ALUOp0 MemWrite RegDst ALUSrc ELEC 5200-001/6200-001 Lecture 10
Single-Cycle Control Circuit Op5 Op4 Op3 Op2 Op1 Op0 lw R sw beq J RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0 Jump ELEC 5200-001/6200-001 Lecture 10
ALU Control Logic ELEC 5200-001/6200-001 Lecture 10
ALU Control Operation select from control From Control Circuit ALUOp1 ALUOp0 3 zero ALU result F3 F2 F1 F0 overflow Operation select ALU function 000 AND 001 OR 010 Add 110 Subtract 111 Set on less than ALU control ELEC 5200-001/6200-001 Lecture 10
Returning to Pipelined Control • Opcode input to control is supplied by the pipeline register IF/ID in the ID (instruction decode) cycle. • Nine control signals are generated in the ID cycle, but none is used. They are saved in the pipeline register ID/EX. • ALUSrc, RegDst and ALUOp (2 bits) are used in the EX (execute) cycle. Remaining 5 control signals are saved in the pipeline register EX/MEM. • Branch, MemWrite and MemRead are used in the MEM (memory access) cycle. Remaining 2 control signals are saved in the pipeline register MEM/WB. • MemtoReg and RegWrite are used in the WB (write back) cycle. • Pipelined control is shown without Jump. ELEC 5200-001/6200-001 Lecture 10
Placing Control in Pipelined Datapath IF/ID IF/ID ID/EX ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 CONTROL 26-31 RegWrite PCSrc Branch MemWrite MemRead MemtoReg zero ALUSrc Instr mem 21-25 PC ALU 16-20 Data mem. Reg. File 1 mux 0 0 mux 1 ALU cont. Sign ext. ALUOp 0-5 16-20 for I-type lw 11-15 for R-type 1 mux 0 RegDst 0-15 ELEC 5200-001/6200-001 Lecture 10
Highlighted Pipelined Control IF/ID ID/EX EX/MEM MEM/WB 4 Add 1 mux 0 ALU opcode Shift left 2 CONTROL 26-31 RegWrite PCSrc Branch MemWrite MemRead MemtoReg zero ALUSrc Instr mem 21-25 PC ALU 16-20 Data mem Reg. File 1 mux 0 0 mux 1 ALU cont. Sign ext. ALUOp 0-5 16-20 for I-type lw 11-15 for R-type 1 mux 0 RegDst 0-15 ELEC 5200-001/6200-001 Lecture 10
Single-Cycle Performance • Assume • 200 ps for memory access • 100 ps for ALU operation • 50 ps for register file read or write • Cycle time set according to longest instruction: lw ≡ IF + ID/RegRead + ALU + MEM + RegWrite = 200 + 50 +100 + 200 + 50 = 600 ps • Av. instruction execution time = clock cycle time = 600 ps ELEC 5200-001/6200-001 Lecture 10
Multicycle Performance • Consider SPECINT2000* instruction mix: • 25% lw 5 cycles • 10% sw 4 cycles • 11% branch 3 cycles • 2% jump 3 cycles • 52% ALU instr. 4 cycles • Av. CPI = 0.25×5 + 0.10×4 + 0.11×3 + 0.02×3 + 0.52×4 = 4.12 • Clock cycle time determined from longest operation (memory access) = 200 ps • Av. instruction execution time = 4.12×200 = 824 ps *Set of benchmark programs used for performance evaluation, to be discussed in a later lecture. ELEC 5200-001/6200-001 Lecture 10
Pipeline Performance • Neglect initial latency (reasonable for long programs). • One instruction completed every clock cycle unless delayed by hazard. Average CPI: • lw 2 cycles in 50% cases due to hazard 1.5 cycles • sw 1 cycle • ALU 1 cycle • branch 2 cycles in 25% cases due to hazard 1.25 cycles • jump 2 cycles • For SPECINT2000 Av. CPI = 0.25×1.5 + 0.10×1 + 0.11×1.25 + 0.02×2.0 + 0.52×1 = 1.17 • Clock cycle time (longest operation: memory access) = 200 ps • Av. instruction execution time = 1.17×200 = 234 ps ELEC 5200-001/6200-001 Lecture 10
Comparing Alternatives ELEC 5200-001/6200-001 Lecture 10
Next • Forwarding • Stall • Branch hazard and branch prediction • Instruction flush • Exceptions ELEC 5200-001/6200-001 Lecture 10
Forwarding • Consider a data hazard: sub $2, $1, $3 # computes result in CC3, writes in $2 in CC5 and $12, $2, $5 # reads $2 in CC3, adds in CC4 CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 MEM:DM CC3: ALU saves new data in EX/MEM, to be written to $2 in CC5 IF: IM EX: ALU WB: REG. WRITE ID: REG. FILE READ sub $2, $1, $3 EX/MEM IF/ID ID/EX MEM/WB MEM:DM EX: ALU IF: IM WB: REG. WRITE ID: REG. FILE READ IF/ID ID/EX and $12, $2, $5 MEM/WB EX/MEM CC3: and reads $2 to ID/EX, but the correct data is in EX/MEM CC4: forwarding allows execution of “and” with correct data ELEC 5200-001/6200-001 Lecture 10
Understanding Forwarding • Let’s ask the following questions: • Q: Why is there a hazard? • A:Source register for the present instruction is the same as the destination register of the previous instruction. • Q: When is the source register data needed? • A: In the execute cycle (CC4). • Q: Is source register data available in CC4? • A: Yes – use forwarding. No – use stall. • Q: Where is the required data in CC4? • A: In the pipeline register EX/MEM as ALU output. ELEC 5200-001/6200-001 Lecture 10
Forwarding Hardware • A forwarding unit is added to execute (ALU) cycle hardware. • Functions of forwarding unit: • Hazard detection • Forward correct data to ALU • Inputs to forwarding unit: • Source registers of present instruction • Destination registers of previous instructions • Outputs of forwarding unit: multiplexer controls to route correct data to the ALU. ELEC 5200-001/6200-001 Lecture 10
Recall Register Definitions • R-type instruction (add, sub, and, or, . . . ) opcode Rs Rt Rd shamt funct • I-type instruction (beq, lw, sw, addi, . . . ) opcode Rs Rt constant_or_address • J-type instruction (j, jal, jr) opcode a___d___d___r___e___s___s where • Rs is the first source register • Rt is the second source register • Rd is the destination register ELEC 5200-001/6200-001 Lecture 10
Forwarding Implemented ID/EX EX/MEM MEM/WB IF/ID Branch addr. PC+4 ALU opcode Shift left 2 26-31 Addr mem 21-25 zero MUX 16-20 Reg. File ALU Data mem. 1 mux 0 0 mux 1 MUX Sign ext. 16-20 11-15 1 mux 0 Rd Rs 21-25 Forwarding unit 16-20 Rt Rd 0-15 ELEC 5200-001/6200-001 Lecture 10
Stall • Delay next instruction by sending noop through pipeline. • Necessary when hazard not resolved by forwarding. CC1 CC2 CC3 CC4 CC5 CC6 CC4: new data in MEM/WB, to be written to $2 DM IM ID, REG. FILE READ ALU REG. FILE WRITE lw $2, 20($1) MEM/WB IF/ID ID/EX EX/MEM DM IM ID, REG. FILE READ REG. FILE WRITE ALU and $4, $2, $5 MEM/WB IF/ID ID/EX EX/MEM CC4: execution of and is impossible; correct data unavailable until end of CC4 ELEC 5200-001/6200-001 Lecture 10
Detecting Hazard Requiring Stall • Consider instruction in IF/ID being decoded: • If • Previous instruction (lw) activated MemRead, and • Instruction being decoded has a source register (Rs or Rt) same as the destination register (Rt for lw) of the previous instruction • Then, stall the pipeline: • Force all control outputs to 0 • Prevent PC from changing • Prevent IF/ID from changing ELEC 5200-001/6200-001 Lecture 10
Stall Implementation Rt MemRead Hazard detection unit PCWrite IF/IDWrite Rs ID/EX EX/MEM MEM/WB opcode IF/ID Control 26-31 MUX 0 Shift left 2 21-25 zero MUX PC Addr mem Reg. File 16-20 ALU 1 mux 0 Data mem. 0 mux 1 MUX Sign ext. 16-20 11-15 1 mux 0 Rd Rs 21-25 Forwarding unit 16-20 Rt Rd 0-15 ELEC 5200-001/6200-001 Lecture 10
MEM:DM IF: IM EX: ALU WB: REG. WRITE ID: REG. FILE READ EX/MEM IF/ID ID/EX MEM/WB Stall • Execution with stall and forwarding: CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC4: new data in MEM/WB, to be written to $2 lw $2, 20($1) MEM:DM ID: REG. FILE READ IF: IM EX: ALU WB: REG. WRITE bubble (noop) IF/ID EX/MEM and $4, $2, $5 ID/EX MEM/WB MEM:DM ID: REG. FILE READ EX: ALU State of IF/ID is frozen in CC3 WB: REG. WRITE ID/EX EX/MEM IF/ID MEM/WB MEM:DM next is fetched twice since PC was frozen IF: IM EX: ALU IF: IM WB: REG. WRITE ID: REG. FILE READ IF/ID ID/EX EX/MEM next IF/ID MEM/WB ELEC 5200-001/6200-001 Lecture 10
Branch Hazard • Consider heuristic – branch not taken. • Continue fetching instructions in sequence following the branch instructions. • If branch is taken (indicated by zero output of ALU): • Control generates branch signal in ID cycle. • branch activates PCSource signal in the MEM cycle to load PC with new branch address. • Three instructions in the pipeline must be flushed if branch is taken – can this penalty be reduced? ELEC 5200-001/6200-001 Lecture 10
Branch Hazard IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 CONTROL 26-31 RegWrite PCSrc Branch MemWrite MemRead MemtoReg beq ALUSrc zero Instr mem 21-25 PC ALU 16-20 Data mem. Reg. File 1 mux 0 0 mux 1 ALU cont. Sign ext. 0-5 ALUOp 16-20 for I-type lw 11-15 for R-type 1 mux 0 RegDst 0-15 ELEC 5200-001/6200-001 Lecture 10
Branch Not Taken Branch to Z A B C D Z cycle b cycle b+1 cycle b+2 cycle b+3 cycle b+4 Branch fetched Branch decoded Branch decision PC keeps D (br. not taken) A fetched A decoded A executed A continues B fetched B decoded B executed C fetched C decoded D fetched ELEC 5200-001/6200-001 Lecture 10
Branch Taken Branch to Z A B C D Z cycle b cycle b+1 cycle b+2 cycle b+3 cycle b+4 Branch fetched Branch decoded Branch decision PC gets Z (br. taken) A fetched A decoded A executed Nop B fetched B decoded Nop C fetched Nop Z fetched Three instructions are flushed if branch is taken ELEC 5200-001/6200-001 Lecture 10
Pipeline Flush • If branch is taken (as indicated by zero), then control does the following: • Change all control signals to 0, similar to the case of stall for data hazard, i.e., insert bubble in the pipeline. • Generate a signal IF.Flush that changes the instruction in the pipeline register IF/ID to 0 (nop). • Penalty of branch hazard is reduced by • Adding branch detection and address generation hardware in the decode cycle – one bubble needed– a next address generation logic in the decode stage writes PC+4, branch address, or jump address into PC. • Using branch prediction. ELEC 5200-001/6200-001 Lecture 10
Branch Prediction • Useful for program loops. • A one-bit prediction scheme: a one-bit buffer carries a “history bit” that tells what happened on the last branch instruction • History bit = 1, branch was taken • History bit = 0, branch was not taken Not taken Predict branch taken 1 Predict branch not taken 0 taken Not taken taken ELEC 5200-001/6200-001 Lecture 10
Branch Prediction Address of Target History recent branch addresses bit(s) instructions Low-order bits used as index PC+4 Next PC 0 1 Prediction Logic = PC ELEC 5200-001/6200-001 Lecture 10
Branch Prediction for a Loop Execution of Instruction 4 1 2 3 4 5 I = 0 I = I + 1 X = X + R(I) I – 10 = 0? N Y Store X in memory h.bit = 0 branch not taken, h.bit = 1 branch taken. ELEC 5200-001/6200-001 Lecture 10
Two-Bit Prediction Buffer • Can improve correct prediction statistics. Not taken Predict branch taken 11 Predict branch taken 10 taken taken Not taken taken Not taken Predict branch not taken 01 Predict branch not taken 00 Not taken taken ELEC 5200-001/6200-001 Lecture 10
Branch Prediction for a Loop Execution of Instruction 4 1 2 3 4 5 I = 0 I = I + 1 X = X + R(I) I – 10 = 0? N Y Store X in memory ELEC 5200-001/6200-001 Lecture 10
Exceptions • A typical exception occurs when ALU produces an overflow signal. • Control asserts following actions on exception: • Change the PC address to 4000 0040hex. This is the location of the exception routine. This is done by adding an additional input to the PC input multiplexer. • Overflow is detected in the EX cycle. Similar to data hazard and pipeline flush, • Set IF/ID to 0 (nop). • Generate ID.Flush and EX.Flush signals to set all control signals to 0 in ID/EX and EX/MEM registers. This also prevents the ALU result (presumed contaminated) from being written in the WB cycle. ELEC 5200-001/6200-001 Lecture 10