1 / 58

CS161 – Design and Architecture of Computer Systems

CS161 – Design and Architecture of Computer Systems. Multi-Cycle CPU Design. Quick recap: Single Cycle Implementation. Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns). Single Cycle – Steps of each instruction.

rhong
Download Presentation

CS161 – Design and Architecture of Computer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS161 – Design and Architecture of Computer Systems Multi-Cycle CPU Design

  2. Quick recap: Single Cycle Implementation • Calculate cycle time assuming negligible delays except: • memory (2ns), ALU and adders (2ns), register file access (1ns)

  3. Single Cycle – Steps of each instruction

  4. Single Cycle – How long is the cycle? The cycle time must accommodate the longest operation: lw. Cycle time = 8 ns but the CPI = 1. If we can accommodate variable number of cycles for each instruction and a cycle time of 1ns. CPI = 6*44% + 8*24% + 7*12% + 5*18% + 2*2% = 6.3

  5. Single cycle execution problem • The cycle time depends on the most time-consuming instruction • What happens if we implement a more complex instruction, e.g., afloating-point multiplication • All resources are simultaneously active – there is no sharing ofresources (waste of area and potentially slower clock) • We’ll adopt a multi-cycle solution which allows us to • Use a faster clock • Reduce physical resources • Determine clock cycle time independently of instruction processing time • Each instruction takes as many clock cycles as it needs to take (different instructions take different number of cycles) • Multiple state transitions per instruction • The states followed by each instruction is different

  6. Multicycle Approach • We will be reusing functional units • ALU used to compute address and to increment PC • Memory used for instruction and data • At the end of a cycle, keep results in registers • Additional registers • Now, control signals are NOT solely determined by the instruction bits! • Controls will be generated by a Finite State Machine (FSM)!

  7. Review: finite state machines • Finite state machines: • a set of states and • next state function (determined by current state and the input) • output function (determined by current state and possibly input) • We’ll use a Moore machine (output based only on current state)

  8. Multicycle Approach • Break up the instructions into steps, each step takes a cycle • balance the amount of work to be done • restrict each cycle to use only one major functional unit • At the end of a cycle • store values for use in later cycles (easiest thing to do) • introduce additional “internal” registers

  9. Muticyle Datapath

  10. Five Execution Steps • Instruction Fetch (IF) • Instruction Decode and Register Fetch (ID/RF) • Execution, Memory Address Computation, or Branch Completion (EX) • Memory Access or R-type instruction completion (M) • Write-back step (WB) • What is CPI in Multicycle machines? INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

  11. Step 1: Instruction Fetch • Access memory w/ PC to fetch instruction and store it in Instruction Register (IR) • Increment PC by 4 using ALU and put the result back in the PC • We can do this because ALU is not busy in this cycle • Can be described concisely using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4;

  12. Datapath of Multicycle Implementation

  13. Step 2: Instruction Decode & Register Fetch • Read registers rs and rt • We read both of them regardless of necessity • Compute the branch address in case the instruction is a branch • We can do this because ALU is not busy • ALUOut will keep the target address • RTL:A = Reg[IR[25-21]]; B = Reg[IR[20-16]];ALUOut = PC + (sign-extend(IR[15-0]) << 2); • We have not set any control signal based n the instruction type yet • We are busy "decoding" it in our control logic

  14. Single Cycle Implementation

  15. Datapath of Multicycle Implementation

  16. Step 3 (instruction dependent) • ALU is performing one of three functions, based on instruction type • Memory Reference:ALUOut = A + sign-extend(IR[15-0]); • R-type:ALUOut = A op B; • Branch: if (A==B) PC = ALUOut; • Jump: PC = {PC[31:28], IR[25:0],2’b00};

  17. Datapath of Multicycle Implementation

  18. Step 4 (R-type or memory-access) • Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; • R-type instructions finish Reg[IR[15-11]] = ALUOut;

  19. Datapath of Multicycle Implementation

  20. Step 5 Write-back step • Reg[IR[20-16]]= MDR; What about all the other instructions?

  21. Summary:

  22. Multi cycle datapath & control: PC

  23. Example: lw, 1st cycle PC

  24. Example: lw, 1st cycle

  25. Example: lw, 2nd cycle PC

  26. Example: lw, 2nd cycle

  27. Example: lw, 3rd cycle PC

  28. Example: lw, 3rd cycle

  29. Example: lw, 4th cycle PC

  30. Example: lw, 4th cycle

  31. Example: lw, 5th cycle PC

  32. Example: lw, 5th cycle

  33. Example: J, 1st cycle PC

  34. Example: J, 1st cycle

  35. Example: J, 2nd cycle PC

  36. Example: J, 2ndcycle

  37. Example: J, 3rd cycle PC

  38. Example: J, 3rd cycle

  39. A Simple Example • How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not add $t5, $t2, $t3 sw $t5, 8($t3)Label: ... • What is going on during the 8th cycle of execution? • In what cycle does the actual addition of $t2 and $t3 takes place?

  40. Implementing the Control • Value of control signals is dependent upon: • what instruction is being executed • which step is being performed • Use the information we’ve accumulated to specify a finite state machine • specify the finite state machine graphically, or • use microprogramming

  41. Graphical Specification of FSM • How many state bits will we need? Start 8 Complete Branch 2 Mem. Address Compute 3 Mem. Access Read 5 Mem. Access Write 9 Complete Jump 6 ALU Execute 4 Write-Back 1 Instruction Decode 0 Instruction Fetch 7 Complete R-type Op = BEQ Op = R-type Op = J Op = LW or SW Op = SW Op = LW

  42. Detailed FSM

  43. Multi-cycle examples

  44. JAL Instruction: Jal #label Interpretation: $ra = PC, $ra is register 31 PC = #label

  45. (Op = ‘JAL’) PCWrite PCSource = 10 RegDest = 10 MemToReg = 10 RegWrite

  46. LWI Similar to HW2, but for Multi-cycle Instruction: LWI Rt, Rd(Rs) * Note LWI is a R-type instruction Interpretation: Reg[Rt] = Mem[Reg[Rd] + Reg[Rs]]

More Related