ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design Part 10: Control Design http://www.ecs.umass.edu/ece/ece232/

Datapath With Control

R-Format Instruction: add $t1, $t2, $t3

Load Instruction Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 lw 0 1 1 1 1 0 0 0 0

Branch-on-Equal Instruction Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 beq x 0 x 0 0 0 1 0 1

I n p u t s O p 5 O p 4 O p 3 O p 2 O p 1 O p 0 O u t p u t s R - f o r m a t I w s w b e q R e g D s t A L U Simple combinational logic Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 S r c M e m t o R e g R e g W r i t e M e m R e a d M e m W r i t e B r a n c h A L U O p 1 A L U O p O

Single-Cycle Machine: Appraisal • All instructions complete in one clock cycle (CPI = 1) • Some instructions take more steps than others • lw is most expensive (5 steps, vs. 4 for R-type and sw, 3 for beq) • Clock cycle must cover longest instruction  inefficient • suppose mult is added? • 32-shift/add steps  would delay every other instruction

Cycle time and speedup computation • Assume: • 2ns for instruction/data memory • 1ns for decode/register read • 2ns for ALU and • 1ns for register write • Single-cycle datapath clock period = 8ns • Assume an instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps • Assuming a variable-cycle datapath, average clock period = 8*0.24+7*0.12+6*0.44+5*0.18+3*0.02=6.36 ns • Possible Speed-up = 1.26

Multicycle Implementation (MIPS-lite v.2) • Want more efficient implementation • Each step will take one clock cycle (not each instruction) [CPI > 1] • shorter clock cycle: cycle time constrained by longest step, not longest instruction • simpler instructions take fewer cycles • higher overall performance • More complex control: finite state machine • Versatile (can extend for new instructions: swap, mult-add etc.)

Clocking: single-cycle vs. multi-cycle Single-cycle Implementation clock waste waste beq $t0,$t1,L add $t0,$t1,$t2 Multicycle Implementation clock add $t0,$t1,$t2 beq $t0,$t1,L Multicycle Implementation: less waste=higher performance

How fast can we run the clock? • Depends on how much we want to be done per clock cycle • Can do: several “inexpensive” datapath operations per clock • simple gates (AND, OR, …) • single datapath registers (PC) • sign extender, left shifter, multiplexor • OR: exactly one “expensive” datapath operation per clock • ALU operation • Register File access (2 reads, or 1 write) • Memory access (read or write)

MIPS-lite Multicycle Version Multicycle Datapath (overview) Instr- uction Register PC ReadReg1 Address Memory A Readdata 1 ReadReg2 A L U Instruction or Data ALU- Out Registers MemoryData Register B Readdata 2 WriteReg Data Data • One ALU (no extra adders) • One Memory (no separate IMem, DMem) • New Temporary Registers (“clocked”/require clock input)

Multicycle Implementation • Datapath changes • one memory: both instructions and data (because can access on separate steps) • one ALU (eliminate extra adders) • extra “invisible” registers to capture intermediate (per-step) datapath results • Controller changes • controller must fire control lines in correct sequence and correct time  controller must remember current execution step, advance to next step

IRWrite RegWrite PCWrite MemRead PCSrc MemWrite ALUSrcA RegDst IorD PCWrite- Cond PC M u x ReadReg1 Address M u x 25:21 Readdata1 z Mem A L U A ReadReg2 ALU- Out M u x 20:16 Read Data Readdata2 WriteReg B 15:0 M u x 15:11 Write Data 4 IR 0 1M 2 u 3 x Regs 3 WriteData MDR M u x Sgn Ext- end << 2 ALU Control 2 (funct) 5:0 2 ALUSrcB MemtoReg ALUOp Datapath + Control Points

FSM diagram for multi-cycle machine cycle1 cycle2 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 1 ALUOp = 0PCWrite PCSrc = 0 start new instruction ALUSrcA = 0 ALUSrcB = 3 ALUOp = 0 1 state 0 lw/sw beq R-format 8 cycle3 6 ALUSrcA = 1 ALUSrcB = 0 ALUOp =1 PCWriteCond PCSrc = 1 2 ALUSrcA = 1 ALUSrcB = 0 ALUOp =2 ALUSrcA = 1 ALUSrcB = 2 ALUOp = 0 Branch Completion Memory Access R-format execution

FSM controller: execution cycles 3-5 from state 6 from state 2 sw to state 0 lw 3 7 5 cycle4 RegDst = 1 RegWrite MemtoReg = 0 MemRead IorD = 1 MemWrite IorD = 1 memory access (step 4) memory access (step 4) R-format completion (step 4) 4 cycle5 RegDst = 0 RegWrite MemtoReg = 1 write-back (step 5)

IRWrite RegWrite PCWrite MemRead PCSrc MemWrite ALUSrcA RegDst IorD PCWrite- Cond PC M u x ReadReg1 Address M u x 25:21 Readdata1 z Mem A L U A ReadReg2 ALU- Out M u x 20:16 Read Data Readdata2 WriteReg B 15:0 M u x 15:11 Write Data 4 IR 0 1M 2 u 3 x Regs 3 WriteData MDR M u x Sgn Ext- end << 2 ALU Control 2 (funct) 5:0 2 ALUSrcB MemtoReg ALUOp Cycle 1 Cycle 1 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 1 ALUOp = 0PCWrite PCSrc = 0

ECE232: Hardware Organization and Design