170 likes | 330 Views
ECE232: Hardware Organization and Design. Part 10: Control Design http://www.ecs.umass.edu/ece/ece232/. Datapath With Control. R-Format Instruction: add $t1, $t2, $t3. Load Instruction. Memto-. Reg. Mem. Mem. Instruction. RegDst. ALUSrc. Reg. Write. Read. Write. Branch. ALUOp1.
E N D
ECE232: Hardware Organization and Design Part 10: Control Design http://www.ecs.umass.edu/ece/ece232/
Load Instruction Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 lw 0 1 1 1 1 0 0 0 0
Branch-on-Equal Instruction Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 beq x 0 x 0 0 0 1 0 1
I n p u t s O p 5 O p 4 O p 3 O p 2 O p 1 O p 0 O u t p u t s R - f o r m a t I w s w b e q R e g D s t A L U Simple combinational logic Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 S r c M e m t o R e g R e g W r i t e M e m R e a d M e m W r i t e B r a n c h A L U O p 1 A L U O p O
Single-Cycle Machine: Appraisal • All instructions complete in one clock cycle (CPI = 1) • Some instructions take more steps than others • lw is most expensive (5 steps, vs. 4 for R-type and sw, 3 for beq) • Clock cycle must cover longest instruction inefficient • suppose mult is added? • 32-shift/add steps would delay every other instruction
Cycle time and speedup computation • Assume: • 2ns for instruction/data memory • 1ns for decode/register read • 2ns for ALU and • 1ns for register write • Single-cycle datapath clock period = 8ns • Assume an instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps • Assuming a variable-cycle datapath, average clock period = 8*0.24+7*0.12+6*0.44+5*0.18+3*0.02=6.36 ns • Possible Speed-up = 1.26
Multicycle Implementation (MIPS-lite v.2) • Want more efficient implementation • Each step will take one clock cycle (not each instruction) [CPI > 1] • shorter clock cycle: cycle time constrained by longest step, not longest instruction • simpler instructions take fewer cycles • higher overall performance • More complex control: finite state machine • Versatile (can extend for new instructions: swap, mult-add etc.)
Clocking: single-cycle vs. multi-cycle Single-cycle Implementation clock waste waste beq $t0,$t1,L add $t0,$t1,$t2 Multicycle Implementation clock add $t0,$t1,$t2 beq $t0,$t1,L Multicycle Implementation: less waste=higher performance
How fast can we run the clock? • Depends on how much we want to be done per clock cycle • Can do: several “inexpensive” datapath operations per clock • simple gates (AND, OR, …) • single datapath registers (PC) • sign extender, left shifter, multiplexor • OR: exactly one “expensive” datapath operation per clock • ALU operation • Register File access (2 reads, or 1 write) • Memory access (read or write)
MIPS-lite Multicycle Version Multicycle Datapath (overview) Instr- uction Register PC ReadReg1 Address Memory A Readdata 1 ReadReg2 A L U Instruction or Data ALU- Out Registers MemoryData Register B Readdata 2 WriteReg Data Data • One ALU (no extra adders) • One Memory (no separate IMem, DMem) • New Temporary Registers (“clocked”/require clock input)
Multicycle Implementation • Datapath changes • one memory: both instructions and data (because can access on separate steps) • one ALU (eliminate extra adders) • extra “invisible” registers to capture intermediate (per-step) datapath results • Controller changes • controller must fire control lines in correct sequence and correct time controller must remember current execution step, advance to next step
IRWrite RegWrite PCWrite MemRead PCSrc MemWrite ALUSrcA RegDst IorD PCWrite- Cond PC M u x ReadReg1 Address M u x 25:21 Readdata1 z Mem A L U A ReadReg2 ALU- Out M u x 20:16 Read Data Readdata2 WriteReg B 15:0 M u x 15:11 Write Data 4 IR 0 1M 2 u 3 x Regs 3 WriteData MDR M u x Sgn Ext- end << 2 ALU Control 2 (funct) 5:0 2 ALUSrcB MemtoReg ALUOp Datapath + Control Points
FSM diagram for multi-cycle machine cycle1 cycle2 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 1 ALUOp = 0PCWrite PCSrc = 0 start new instruction ALUSrcA = 0 ALUSrcB = 3 ALUOp = 0 1 state 0 lw/sw beq R-format 8 cycle3 6 ALUSrcA = 1 ALUSrcB = 0 ALUOp =1 PCWriteCond PCSrc = 1 2 ALUSrcA = 1 ALUSrcB = 0 ALUOp =2 ALUSrcA = 1 ALUSrcB = 2 ALUOp = 0 Branch Completion Memory Access R-format execution
FSM controller: execution cycles 3-5 from state 6 from state 2 sw to state 0 lw 3 7 5 cycle4 RegDst = 1 RegWrite MemtoReg = 0 MemRead IorD = 1 MemWrite IorD = 1 memory access (step 4) memory access (step 4) R-format completion (step 4) 4 cycle5 RegDst = 0 RegWrite MemtoReg = 1 write-back (step 5)
IRWrite RegWrite PCWrite MemRead PCSrc MemWrite ALUSrcA RegDst IorD PCWrite- Cond PC M u x ReadReg1 Address M u x 25:21 Readdata1 z Mem A L U A ReadReg2 ALU- Out M u x 20:16 Read Data Readdata2 WriteReg B 15:0 M u x 15:11 Write Data 4 IR 0 1M 2 u 3 x Regs 3 WriteData MDR M u x Sgn Ext- end << 2 ALU Control 2 (funct) 5:0 2 ALUSrcB MemtoReg ALUOp Cycle 1 Cycle 1 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 1 ALUOp = 0PCWrite PCSrc = 0