580 likes | 592 Views
CS161 – Design and Architecture of Computer Systems. Multi-Cycle CPU Design. Quick recap: Single Cycle Implementation. Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns). Single Cycle – Steps of each instruction.
E N D
CS161 – Design and Architecture of Computer Systems Multi-Cycle CPU Design
Quick recap: Single Cycle Implementation • Calculate cycle time assuming negligible delays except: • memory (2ns), ALU and adders (2ns), register file access (1ns)
Single Cycle – How long is the cycle? The cycle time must accommodate the longest operation: lw. Cycle time = 8 ns but the CPI = 1. If we can accommodate variable number of cycles for each instruction and a cycle time of 1ns. CPI = 6*44% + 8*24% + 7*12% + 5*18% + 2*2% = 6.3
Single cycle execution problem • The cycle time depends on the most time-consuming instruction • What happens if we implement a more complex instruction, e.g., afloating-point multiplication • All resources are simultaneously active – there is no sharing ofresources (waste of area and potentially slower clock) • We’ll adopt a multi-cycle solution which allows us to • Use a faster clock • Reduce physical resources • Determine clock cycle time independently of instruction processing time • Each instruction takes as many clock cycles as it needs to take (different instructions take different number of cycles) • Multiple state transitions per instruction • The states followed by each instruction is different
Multicycle Approach • We will be reusing functional units • ALU used to compute address and to increment PC • Memory used for instruction and data • At the end of a cycle, keep results in registers • Additional registers • Now, control signals are NOT solely determined by the instruction bits! • Controls will be generated by a Finite State Machine (FSM)!
Review: finite state machines • Finite state machines: • a set of states and • next state function (determined by current state and the input) • output function (determined by current state and possibly input) • We’ll use a Moore machine (output based only on current state)
Multicycle Approach • Break up the instructions into steps, each step takes a cycle • balance the amount of work to be done • restrict each cycle to use only one major functional unit • At the end of a cycle • store values for use in later cycles (easiest thing to do) • introduce additional “internal” registers
Five Execution Steps • Instruction Fetch (IF) • Instruction Decode and Register Fetch (ID/RF) • Execution, Memory Address Computation, or Branch Completion (EX) • Memory Access or R-type instruction completion (M) • Write-back step (WB) • What is CPI in Multicycle machines? INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Step 1: Instruction Fetch • Access memory w/ PC to fetch instruction and store it in Instruction Register (IR) • Increment PC by 4 using ALU and put the result back in the PC • We can do this because ALU is not busy in this cycle • Can be described concisely using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4;
Step 2: Instruction Decode & Register Fetch • Read registers rs and rt • We read both of them regardless of necessity • Compute the branch address in case the instruction is a branch • We can do this because ALU is not busy • ALUOut will keep the target address • RTL:A = Reg[IR[25-21]]; B = Reg[IR[20-16]];ALUOut = PC + (sign-extend(IR[15-0]) << 2); • We have not set any control signal based n the instruction type yet • We are busy "decoding" it in our control logic
Step 3 (instruction dependent) • ALU is performing one of three functions, based on instruction type • Memory Reference:ALUOut = A + sign-extend(IR[15-0]); • R-type:ALUOut = A op B; • Branch: if (A==B) PC = ALUOut; • Jump: PC = {PC[31:28], IR[25:0],2’b00};
Step 4 (R-type or memory-access) • Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; • R-type instructions finish Reg[IR[15-11]] = ALUOut;
Step 5 Write-back step • Reg[IR[20-16]]= MDR; What about all the other instructions?
A Simple Example • How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not add $t5, $t2, $t3 sw $t5, 8($t3)Label: ... • What is going on during the 8th cycle of execution? • In what cycle does the actual addition of $t2 and $t3 takes place?
Implementing the Control • Value of control signals is dependent upon: • what instruction is being executed • which step is being performed • Use the information we’ve accumulated to specify a finite state machine • specify the finite state machine graphically, or • use microprogramming
Graphical Specification of FSM • How many state bits will we need? Start 8 Complete Branch 2 Mem. Address Compute 3 Mem. Access Read 5 Mem. Access Write 9 Complete Jump 6 ALU Execute 4 Write-Back 1 Instruction Decode 0 Instruction Fetch 7 Complete R-type Op = BEQ Op = R-type Op = J Op = LW or SW Op = SW Op = LW
JAL Instruction: Jal #label Interpretation: $ra = PC, $ra is register 31 PC = #label
(Op = ‘JAL’) PCWrite PCSource = 10 RegDest = 10 MemToReg = 10 RegWrite
LWI Similar to HW2, but for Multi-cycle Instruction: LWI Rt, Rd(Rs) * Note LWI is a R-type instruction Interpretation: Reg[Rt] = Mem[Reg[Rd] + Reg[Rs]]