1 / 17

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design. Part 10: Control Design http://www.ecs.umass.edu/ece/ece232/. Datapath With Control. R-Format Instruction: add $t1, $t2, $t3. Load Instruction. Memto-. Reg. Mem. Mem. Instruction. RegDst. ALUSrc. Reg. Write. Read. Write. Branch. ALUOp1.

Download Presentation

ECE232: Hardware Organization and Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE232: Hardware Organization and Design Part 10: Control Design http://www.ecs.umass.edu/ece/ece232/

  2. Datapath With Control

  3. R-Format Instruction: add $t1, $t2, $t3

  4. Load Instruction Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 lw 0 1 1 1 1 0 0 0 0

  5. Branch-on-Equal Instruction Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 beq x 0 x 0 0 0 1 0 1

  6. I n p u t s O p 5 O p 4 O p 3 O p 2 O p 1 O p 0 O u t p u t s R - f o r m a t I w s w b e q R e g D s t A L U Simple combinational logic Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 S r c M e m t o R e g R e g W r i t e M e m R e a d M e m W r i t e B r a n c h A L U O p 1 A L U O p O

  7. Single-Cycle Machine: Appraisal • All instructions complete in one clock cycle (CPI = 1) • Some instructions take more steps than others • lw is most expensive (5 steps, vs. 4 for R-type and sw, 3 for beq) • Clock cycle must cover longest instruction  inefficient • suppose mult is added? • 32-shift/add steps  would delay every other instruction

  8. Cycle time and speedup computation • Assume: • 2ns for instruction/data memory • 1ns for decode/register read • 2ns for ALU and • 1ns for register write • Single-cycle datapath clock period = 8ns • Assume an instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps • Assuming a variable-cycle datapath, average clock period = 8*0.24+7*0.12+6*0.44+5*0.18+3*0.02=6.36 ns • Possible Speed-up = 1.26

  9. Multicycle Implementation (MIPS-lite v.2) • Want more efficient implementation • Each step will take one clock cycle (not each instruction) [CPI > 1] • shorter clock cycle: cycle time constrained by longest step, not longest instruction • simpler instructions take fewer cycles • higher overall performance • More complex control: finite state machine • Versatile (can extend for new instructions: swap, mult-add etc.)

  10. Clocking: single-cycle vs. multi-cycle Single-cycle Implementation clock waste waste beq $t0,$t1,L add $t0,$t1,$t2 Multicycle Implementation clock add $t0,$t1,$t2 beq $t0,$t1,L Multicycle Implementation: less waste=higher performance

  11. How fast can we run the clock? • Depends on how much we want to be done per clock cycle • Can do: several “inexpensive” datapath operations per clock • simple gates (AND, OR, …) • single datapath registers (PC) • sign extender, left shifter, multiplexor • OR: exactly one “expensive” datapath operation per clock • ALU operation • Register File access (2 reads, or 1 write) • Memory access (read or write)

  12. MIPS-lite Multicycle Version Multicycle Datapath (overview) Instr- uction Register PC ReadReg1 Address Memory A Readdata 1 ReadReg2 A L U Instruction or Data ALU- Out Registers MemoryData Register B Readdata 2 WriteReg Data Data • One ALU (no extra adders) • One Memory (no separate IMem, DMem) • New Temporary Registers (“clocked”/require clock input)

  13. Multicycle Implementation • Datapath changes • one memory: both instructions and data (because can access on separate steps) • one ALU (eliminate extra adders) • extra “invisible” registers to capture intermediate (per-step) datapath results • Controller changes • controller must fire control lines in correct sequence and correct time  controller must remember current execution step, advance to next step

  14. IRWrite RegWrite PCWrite MemRead PCSrc MemWrite ALUSrcA RegDst IorD PCWrite- Cond PC M u x ReadReg1 Address M u x 25:21 Readdata1 z Mem A L U A ReadReg2 ALU- Out M u x 20:16 Read Data Readdata2 WriteReg B 15:0 M u x 15:11 Write Data 4 IR 0 1M 2 u 3 x Regs 3 WriteData MDR M u x Sgn Ext- end << 2 ALU Control 2 (funct) 5:0 2 ALUSrcB MemtoReg ALUOp Datapath + Control Points

  15. FSM diagram for multi-cycle machine cycle1 cycle2 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 1 ALUOp = 0PCWrite PCSrc = 0 start new instruction ALUSrcA = 0 ALUSrcB = 3 ALUOp = 0 1 state 0 lw/sw beq R-format 8 cycle3 6 ALUSrcA = 1 ALUSrcB = 0 ALUOp =1 PCWriteCond PCSrc = 1 2 ALUSrcA = 1 ALUSrcB = 0 ALUOp =2 ALUSrcA = 1 ALUSrcB = 2 ALUOp = 0 Branch Completion Memory Access R-format execution

  16. FSM controller: execution cycles 3-5 from state 6 from state 2 sw to state 0 lw 3 7 5 cycle4 RegDst = 1 RegWrite MemtoReg = 0 MemRead IorD = 1 MemWrite IorD = 1 memory access (step 4) memory access (step 4) R-format completion (step 4) 4 cycle5 RegDst = 0 RegWrite MemtoReg = 1 write-back (step 5)

  17. IRWrite RegWrite PCWrite MemRead PCSrc MemWrite ALUSrcA RegDst IorD PCWrite- Cond PC M u x ReadReg1 Address M u x 25:21 Readdata1 z Mem A L U A ReadReg2 ALU- Out M u x 20:16 Read Data Readdata2 WriteReg B 15:0 M u x 15:11 Write Data 4 IR 0 1M 2 u 3 x Regs 3 WriteData MDR M u x Sgn Ext- end << 2 ALU Control 2 (funct) 5:0 2 ALUSrcB MemtoReg ALUOp Cycle 1 Cycle 1 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 1 ALUOp = 0PCWrite PCSrc = 0

More Related