410 likes | 976 Views
Basic MIPS Architecture: Single-Cycle Datapath and Control. Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar. Outline. Introduction Clocking Single-cycle Datapath Single-cycle Control Performance Analysis. Introduction. So far, we have built a small ALU
E N D
Basic MIPS Architecture:Single-Cycle Datapath and Control Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar
Outline • Introduction • Clocking • Single-cycle Datapath • Single-cycle Control • Performance Analysis
Introduction • So far, we have built a small ALU • ADD, SUB, SLT, AND, OR, … • What about • Memory and registers? • Control operations? • Interpreting (decoding) instructions? • The big picture • The CPU’s datapath deals with moving data around • The CPU’s control manages the data • Generic implementation Fetch PC = PC+4 Execute Decode
Clocking • The clocking methodology defines when signals can be read and when they are written • An edge-triggered methodology • Typical execution • read contents of state elements • send values through combinational logic • writeresults to one or more state elements • Assumes state elements are written on every clock cycle; if not, need explicit write control signal • write occurs only when both the write control is asserted and the clock edge occurs Combinational logic State Element State Element clock one clock cycle
Single-Cycle Datapath • The first implementation considered • All instructions start and finish execution in one cycle! • This include the time required to fetch, decode, and execute the instruction • In the following, we will consider the datapath of each of these steps
Single-Cycle Datapath • Fetch Datapath • Fetching the instruction from memory requires • Sending the PC to memory to read the instruction • Update the PC to point to the next instruction • Do we need an explicit write signal for writing the PC? • Do we need an explicit read signal for reading the memory? + 4 Instruction Memory PC Read Address Data Instruction
Single-Cycle Datapath • Decode Datapath • Regardless of the instruction • Send the opcode (31-26) and the function(5-0) fields of the instruction to the control unit • Read two registers; rs (25-21) andrt (20-16) • Reading is not harmful! Control Unit Read Addr 1 Read Data 1 R[rs] Read Addr 2 Instruction Register File Write Addr Read Data 2 R[rt] Write Data
Single-Cycle Datapath • Inside the Register File • How can we read a register out of 32 registers? Read Register 1 32-to-1 MUX Register 0 0 1 Register 1 Read Data 1 Register 2 …. Register 31 31 Read Register 2 32-to-1 MUX 0 1 Read Data 2 31
Single-Cycle Datapath • Inside the Register File • How can we write a register out of 32 registers? Clock Write C C C C Register 2 Register 31 ….. Register 0 Register 1 C 5-to-32 Decoder 0 D D D D D 1 Register Number 31 Write Data
Single-Cycle Datapath • Execution Datapath • R-type instructions (ADD, SUB, SLT, AND, OR) • The two registers are read already! • Perform operation based on OPCODE and FUNC fields • Store the result back into the register file (the destination register is specified in rd field of the instruction (15-11)! • The register file is not written on every cycle! Need an explicit write signal RegWrite ALU Control Write R[rs] Read Addr 1 Read Data 1 Instruction Read Addr 2 ALU R[rt] Register File Read Data 2 Write Addr Write Data
Single-Cycle Datapath • Execution Datapath • Load Instruction • Compute the load address • Store the loaded data in the register file. The destination register is the rt field of the instruction (20-16) MemRead RegWrite ALU Control Write R[rs] Address Read Addr 1 Read Data 1 Instruction Read Addr 2 ALU R[rt] DataMemory Register File Read Data 2 Data Write Addr Write Data Write Data Sign Ext. MemWrite
Single-Cycle Datapath • Execution Datapath • Store Instruction • Compute the load address • Store register in the memory MemRead RegWrite ALU Control Write R[rs] Address Read Addr 1 Read Data 1 Instruction Read Addr 2 ALU DataMemory R[rt] Register File Read Data 2 Data Write Addr Write Data Write Data Sign Ext. MemWrite
Single-Cycle Datapath • Execution Datapath • Branch Instruction • Compare the two registers • Compute the branch address • Change PC if true ! Branch Address ALU Control + + 4 RegWrite Zero Write Zero Read Addr 1 Read Data 1 PC Read Addr 2 ALU Instruction Register File Read Data 2 Write Addr Branch Address 0 Write Data 1 Sign Ext. x4
Single-Cycle Datapath • Execution Datapath • Jump Instruction • Compute the jump address • Store it in the PC + jump address 4 Jump Instruction Memory PC x4 Read Address Data Instruction 0 1
Single-Cycle Datapath • Creating the Single Datapath • Assemble the datapath segments and add controllinesand multiplexors as needed • Single cycle design • Fetch, decode and execute each instructions in one clock cycle • No datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders) • Multiplexors needed at the input of shared elements with control lines to do the selection • Write signals to control writing to the Register File and Data Memory • Cycle time is determined by length of the longest path
Single-Cycle Datapath 1 Instr[25-0] Shift left 2 0 0 PC[31-28] + + 1 4 Shift left 2 Jump PCSrc Branch ALUOp MemRead Control Unit MemtoReg Instr[31-26] MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Instr[20-16] Register File zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Write Data 1 Instr[15 -11] Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Single-Cycle Control • Need to design the control that generates the appropriate control signals based on the Opcode and Function fields to • Specify the operation of the ALU • Control the data flow by selecting the appropriate input of the multiplexors • With the following observations across different instructions • Op field is always in bits 31-26 of the instruction • Address of registers to be read are always specified by • The rs field (bits 25-21) • The rt field (bits 20-16) • For LW and SW, the rs field is the base register • Address of register to be written is in one of two places • For LW, the address is the rt field (bits 20-16 ) • For R-type, the address is the rd field (bits 15-11) • Offset for BEQ, LW, and SW is always in bits 15-0 of the instruction
R-type Instruction Data/Control Flow 1 Instr[26-0] Shift left 2 0 0 PC[31-28] + + 1 4 Shift left 2 Jump PCSrc Branch ALUOp MemRead Control Unit MemtoReg Instr[31-26] MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Instr[20-16] Register File zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Write Data 1 Instr[15 -11] Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Load Word Instruction Data/Control Flow 1 Instr[26-0] Shift left 2 0 0 PC[31-28] + + 1 4 Shift left 2 Jump PCSrc Branch ALUOp MemRead Control Unit MemtoReg Instr[31-26] MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Instr[20-16] Register File zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Write Data 1 Instr[15 -11] Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Branch Equal Instruction Data/Control Flow 1 Instr[26-0] Shift left 2 0 0 PC[31-28] + + 1 4 Shift left 2 Jump PCSrc Branch ALUOp MemRead Control Unit MemtoReg Instr[31-26] MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Instr[20-16] Register File zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Write Data 1 Instr[15 -11] Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Jump Instruction Data/Control Flow 1 Instr[26-0] Shift left 2 0 0 PC[31-28] + + 1 4 Shift left 2 Jump PCSrc Branch ALUOp MemRead Control Unit MemtoReg Instr[31-26] MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Instr[20-16] Register File zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Write Data 1 Instr[15 -11] Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Single-Cycle Control • The Main Control Unit • The input is the Op field (6 bits) from the instruction • The output is nine control signals • The truth table !
Single-Cycle Control • The Main Control Unit • To design the logic circuit, generate the appropriate minterms for each output signal • Simply, use a PLA!
Single-Cycle Control Bnegate ALUop • The ALU Control Unit • It has two inputs • ALUop (2 bits) from Main control • Func (6 bits) from the instruction • It has two outputs • Bengate (1 bits) • Operation (2 bits) • Supported Operations ALU control Operation Func
Single-Cycle Control • The ALU Control Unit • Truth Table !
Single-Cycle Control • The ALU Control Unit • Hardware Implementation • Generating minterms!! Minimization!! • By inspection!
Performance Analysis • All instructions have to finish in one cycle! • How long is the cycle time? • Different units are used in different instructions • Each unit has its own delay • Need to find the longest path! • Assume the following times • Thus, the cycle time should be at least 8 ns R-type:Instr. FetchRegisterReadALURegisterWrite 6ns LW:Instr. FetchRegisterReadALUMemory ReadRegisterWrite 8ns SW:Instr. FetchRegisterReadALUMemory Write 7ns Branch:Instr. FetchRegisterReadALU 5ns Jump: Instr. Fetch 2ns
Performance Analysis • The cycle time is fixed! • However, not all instructions require the same time! There is a wasted time for some instructions?! • Possible Solution? Cycle 1 Cycle 2 Clock LW SW waste
Performance Analysis • Example 1. Example 1. consider the following two implementations of a single cycle machine: • Machine A : all instructions execute in one cycle of fixed length • Machine B: all instructions execute in one cycle , however, the cycle time adapts to instruction types Use the information given in the tables to compare the two machines
Performance Analysis • Example 1. Continued. • CPU Execution Time = IC x CPI x Clock cycle time • CPI A = CPIB = 1 • ICA = ICB • CCA= 600 ns • CCB = 600 x 0.25 + 550 x 0.1 + 400 x 0.45 + 350 x 0.15 + 200 x 0.05 = 447.5 ps • performancB / performanceA = 600 / 447.5 = 1.34 • So, adaptive clock cycle is faster; however it is hard to implement !
Single Cycle Disadvantages & Advantages • Single-cycle implementation assumes that all instructions can execute in one cycles • Advantages • Simple and easy to understand • Disadvantages • Hardware duplication! • Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction (especially problematic for more complex instructions like floating point multiply)