CPE 232 Computer Organization Basic MIPS Architecture – Part II

CPE 232 Computer Organization Basic MIPS Architecture – Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html

Multicycle Datapath Approach • Let an instruction take more than 1 clock cycle to complete • Break up instructions into steps where • each step takes a cycle while trying to • balance the amount of work to be done in each step • restrict each cycle to use only one major functional unit; unless used in parallel • Not every instruction takes thesame number of clock cycles • In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result • Need one memory only– but only one memory access per cycle • Need one ALU/adder only – but only one ALU operation per cycle

IR Address Memory A Read Addr 1 PC Read Data 1 Register File Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data Read Data 2 B Write Data MDR Multicycle Datapath Approach, con’t • At the end of a cycle • Store values needed in a later cycle by the current instruction in internal registers (A,B, IR, and MDR) . These registers are invisible to the programmer. • All of these registers, except IR, hold data only between a pair of adjacent clock cycles thus they don’t need write control signal. IR– Instruction Register MDR– Memory Data Register A, B – regfile read data registers ALUout– ALU output register • Data used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory)

Multicycle Datapath Approach, con’t • Similar to single cycle, shared functional units should have multiplexers at their inputs. • There is only one adder that will be used to update PC, perform ALU operations, comparison for beq, memory address computation, and branch address computation.

Multicycle Datapath Approach- Control Signals

MDR The Multicycle Datapath with Control Signals PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 0 1 Address Memory 0 PC 0 Read Addr 1 A Read Data 1 IR Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) 0 ALUout ALU Write Addr Write Data 1 Read Data 2 B 0 1 Write Data 4 1 0 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]

Multicycle Machine: 1-bit Control Signals

Multicycle Machine: 2-bit Control Signals

IFetch Exec Mem WB Breaking Instruction Execution into Clock Cycles 1.IFetch: Instruction Fetch and Update PC (Same for all instructions) • Operations 1.1 Instruction Fetch: IR <= Memory[PC] 1.2 Update PC : PC <= PC + 4 • Control signals values • IorD = 0 , MemRead = 1 , IRWrite = 1 • ALUSrcA = 0, ALUSrcB = 01, ALUOp = 00, PCWrite =1 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Dec

Breaking Instruction Execution into Clock Cycles 2. Dec: Instruction decode and register fetch (same for all instructions) We don’t know the instruction yet, do non harmful operations • Operations 2.1 read the two source registers rs and rt and place them in registers A and B, respectively. A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] 2.2 Compute the branch address ALUOut <= PC + (sign-extend(IR[15:0]) <<2) • Control signals values • ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00

Breaking Instruction Execution into Clock Cycles 3. Execution, Memory address computation, or branch completion Operation in this cycle depends on instruction type • Operations * if memory reference, compute address ALUOut <= A + sign-extend(IR[15:0]) ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00 * if arithmetic-logic instruction, perform operation ALUOut <= A op B ALUSrcA = 1, ALUSrcB = 00, ALUOp = 10

Breaking Instruction Execution into Clock Cycles 3. Execution, Memory address computation, or branch completion (continued) operation depends on instruction type • Operations * if branch instruction if (A == B) PC<= ALUOut ALUSrcA = 1, ALUSrcB = 00, ALUOp = 01, PCWriteCond = 1, PCSrc = 01 * if jump instruction PC <= {PC[31:28], (IR[25:0],2’b00)} PCSource = 10, PCWrite = 1

Breaking Instruction Execution into Clock Cycles 4. Memory access or R-type completion operation in this cycle depends on instruction type • Operations * if load instruction : read value from memory into MDR MDR <= Memory[ALUOut] MemRead = 1, IorD = 1 * if store instruction: store rt into memory Memory[ALUOut] <= B MemWrite = 1, IorD = 1 * if arithmetic-logical instruction: write ALU result into rd Reg[IR[15:11]] <= ALUOut MemtoReg = 0, RegDst = 1, RegWrite = 1

Breaking Instruction Execution into Clock Cycles 5. Memory read completion Needed for the load instruction only • Operations 5.1 store the loaded value in MDR into rt Reg[IR[20:16]] <= MDR RegWrite = 1, MemtoReg = 1, RegDst = 0

Breaking Instruction Execution into Clock Cycles • In this implementation, not all instructions take 5 cycles

Multicycle Performance • Compute the average CPI for multicycle implementation for SPECINT2000 program which has the following instruction mix: 25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU. Assume the CPI for each instruction class as given in the previous table • CPI = ΣCPIi x ICi / IC = 0.25 x 5 + 0.1 x 4 + 0.11 x 3 + 0.02 x 3 + 0.52 x 4 = 4.12 • Compare to CPI = 1 for single cycle ?!! • Assume CCM = 1/5 CCS • Then PerformanceM / PerformanceS = (IC x 1 x CCS ) / (IC x 4.12 x (1/5) CCS) = 1.21 • Multicycle is also cost-effective in terms of hardware.

Datapath control points Combinational control logic . . . . . . . . . State Reg Inst Opcode Next State Multicycle Control Unit • Multicycle datapath control signals are not determined solely by the bits in the instruction • e.g., op code bits tell what operation the ALU should be doing, but not what instruction cycle is to be done next • Since the instruction is broken into multiple cycles, we need to know what we did in the previous cycle(s) in order to determine the current action • Must use a finite state machine (FSM) for control • a set of states (current state stored in State Register) • next state function (determined by current state and the input) • output function (determined by current state and the input)

The States of the Control Unit • 10 states are required in the FSM control • The sequence of states is determined by five steps of execution and the instruction

The Control Unit • Logic gates • inputs : present state + opcode  #bits = 10 • outputs: control + next state  #bits = 20 • truth table size = 210 rows x 20 columns • ROM • Can be used to implement the truth table above (210 x 20 bit = 20 Kbit) • Each location stores the control signals values and the next state • Each location is addressable by the opcode and next state value

Micro-programmed Control Unit • ROM implementation is vulnerable to bugs and expensive especially for complex. Size increase as the number of instructions (states) increases. • Use Microprogramming • The next state value may not be sequential • Generate the next state outside the storage element • Each state is a microinstruction and the signals are specified symbolically • Use labels for sequencing

Microprogram • The microassembler converts the microcode into actual signal values • The sequencing field is used along with the opcode to determine the next state

Sequencer

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Multicycle Advantages & Disadvantages • Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step • Multicycle implementations allow functional units to be used more than once per instruction as long as they are used on different clock cycles but • Requires additional internal state registers, more muxes, and more complicated (FSM) control

Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste multicycle clock slower than 1/5th of single cycle clock due to state register overhead Multiple Cycle Implementation: IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type Single Cycle vs. Multiple Cycle Timing

CPE 232 Computer Organization Basic MIPS Architecture – Part II