450 likes | 674 Views
COMP541 Datapaths II & Single-Cycle MIPS. Montek Singh Apr 2, 2012. Topics. Complete the datapath Add control to it Create a full single-cycle MIPS! Reading Chapter 7 Review MIPS assembly language Chapter 6 of course textbook Or, Patterson Hennessy (inside flap). Top-Level CPU (MIPS).
E N D
COMP541Datapaths II &Single-Cycle MIPS Montek Singh Apr 2, 2012
Topics • Complete the datapath • Add control to it • Create a full single-cycle MIPS! • Reading • Chapter 7 • Review MIPS assembly language • Chapter 6 of course textbook • Or, Patterson Hennessy (inside flap)
Top-Level CPU (MIPS) reset clk clk memwrite dataadr readdata writedata pc[31:2] Instr Memory MIPS Data Memory instr
Top-Level CPU: Verilog module top(input clk, reset, output … ); // add signals here for debugging wire [31:0] pc, instr, readdata, writedata, dataadr; wire memwrite; mips mips(clk, reset, pc, instr, memwrite, dataadr, writedata, readdata); // processor imem imem(pc[31:2], instr); // instr memory dmem dmem(clk, memwrite, dataadr, writedata, readdata); // data memory endmodule
Top Level Schematic (ISE) imem MIPS dmem
One level down: Inside MIPS module mips(input clk, reset, output [31:0] pc, input [31:0] instr, output memwrite, output [31:0] aluout, writedata, input [31:0] readdata); wire memtoreg, branch, pcsrc, alusrc, regdst, regwrite, jump; wire [4:0] alucontrol; // depends on your ALU wire [3:0] flags;// flags = {Z, V, C, N} controller c(instr[31:26], instr[5:0], flags, memtoreg, memwrite, pcsrc, alusrc, regdst, regwrite, jump, alucontrol); datapathdp(clk, reset, memtoreg, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, flags, pc, instr, aluout, writedata, readdata); endmodule
A Note on Flags • Book’s design only uses Z (zero) • simple version of MIPS • allows beq, bne, slt type of tests • Our design uses { Z, V, C, N } flags • Z = zero • V = overflow • C = carry out • N = negative • Allows richer variety of instructions • see next slide • wherever you see “zero” in these slides, it should probably read “flags”
A Note on Flags -or- • 4 flags produced by ALU: • Z (zero): result is = 0 • big NOR gate • N (negative): result is < 0 • SN-1 • C (carry): indicates that most significant position produced a carry, e.g., “1 + (-1)” • Carry from last FA • V (overflow): indicates answer doesn’t fit • precisely: To compare A and B, perform A–B and use condition codes: Signed comparison: LT NV LE Z+(NV) EQ Z NE ~Z GE ~(NV) GT ~(Z+(NV)) Unsigned comparison: LTU C LEU C+Z GEU ~C GTU ~(C+Z)
Datapath flags(3:0)
MIPS State Elements • We’ll fill out the datapath and control logic for basic single cycle MIPS • first the datapath • then the control logic
Single-Cycle Datapath: lw • Let’s start by implementing lw instruction
Single-Cycle Datapath: lw • First consider executing lw • How does lw work? • STEP 1: Fetch instruction
Single-Cycle Datapath: lw • STEP 2: Read source operands from register file
Single-Cycle Datapath: lw • STEP 3: Sign-extend the immediate
Single-Cycle Datapath: lw • STEP 4: Compute the memory address Note Control
Single-Cycle Datapath: lw • STEP 5: Read data from memory and write it back to register file
Single-Cycle Datapath: lw • STEP 6: Determine the address of the next instruction
Let’s be Clear: CPU is Single-Cycle! • Although the slides said “STEP” … • … all that stuff is executed in one cycle!!! • Let’s look at sw next … • … and then R-type instructions
Single-Cycle Datapath: sw • Write data in rt to memory • nothing is written back into the register file
Single-Cycle Datapath: R-type instr • R-Type instructions: • Read from rs and rt • Write ALUResult to register file • Write to rd (instead of rt)
Single-Cycle Datapath: beq • Determine whether values in rs and rt are equal • Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4)
Note: Difference due to Flags • Our Control Unit will be slightly different • … because of the extra flags • All flags (Z, V, C, N) are inputs to the control unit • Signals such as PCSrc are produced inside the control unit
Control Unit • Generally as shown below • but some differences because our ALU is more sophisticated flags[3:0] Note: This will be different for our full-feature ALU! PCSrc Note: This will be 5 bits for our full-feature ALU!
Review: Our “full feature” ALU Boolean Bidirectional Barrel Shifter Add/Sub 0 1 … • Full-feature ALU from COMP411: A B 5-bit ALUFN Sub Bool Shft Math OP 0 XX 0 1 A+B 1 XX 0 1 A-B X X0 1 1 0 X X1 1 1 1 X 00 1 0 B<<A X 10 1 0 B>>A X 11 1 0 B>>>A X 00 0 0 A & B X 01 0 0 A | B X 10 0 0 A ^ B X 11 0 0 A | B Sub Bool 1 0 Shft Math 1 0 R FlagsV,C N Flag Z Flag
Review: R-Type instructions • Register-type • 3 register operands: • rs, rt: source registers • rd: destination register • Other fields: • op: the operation code or opcode (0 for R-type instructions) • funct: the function • together, op and funct tell the computer which operation to perform • shamt: the shift amount for shift instructions, otherwise itis 0
Controller (2 modules) module controller(input [5:0] op, funct, input [3:0] flags, output memtoreg, memwrite, output pcsrc, alusrc, output regdst, regwrite, output jump, output [2:0] alucontrol); // 5 bits for our ALU!! wire [1:0] aluop; // This will be different for our ALU wire branch; maindecmd(op, memtoreg, memwrite, branch, alusrc, regdst, regwrite, jump, aluop); aludec ad(funct, aluop, alucontrol); assign pcsrc = branch & flags[3]; // flags = {Z, V, C, N} endmodule
Main Decoder module maindec(input [5:0] op, output memtoreg, memwrite, branch, alusrc, output regdst, regwrite, jump, output [1:0] aluop); // different for our ALU reg [8:0] controls; assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls; always @(*) case(op) 6'b000000: controls <= 9'b110000010; //Rtype 6'b100011: controls <= 9'b101001000; //LW 6'b101011: controls <= 9'b001010000; //SW 6'b000100: controls <= 9'b000100001; //BEQ 6'b001000: controls <= 9'b101000000; //ADDI 6'b000010: controls <= 9'b000000100; //J default: controls <= 9'bxxxxxxxxx; //??? endcase endmodule Why do this? This entire coding may be different in our design
ALU Decoder module aludec(input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol); // 5 bits for our ALU!! always @(*) case(aluop) 2'b00: alucontrol <= 3'b010; // add 2'b01: alucontrol <= 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol <= 3'b010; // ADD 6'b100010: alucontrol <= 3'b110; // SUB 6'b100100: alucontrol <= 3'b000; // AND 6'b100101: alucontrol <= 3'b001; // OR 6'b101010: alucontrol <= 3'b111; // SLT default: alucontrol <= 3'bxxx; // ??? endcase endcase endmodule This entire coding will be different in our design
Control Unit: ALU Decoder This entire coding will be different in our design
Note on controller • The actual number and names of control signals may be somewhat different in our/your design • compared to the one given in the book • because we are implementing more features/instructions • SO BE VERY CAREFUL WHEN YOU DESIGN YOUR CPU!
Extended Functionality: addi • No change to datapath
Review: Processor Performance Program Execution Time = (# instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x TC
Single-Cycle Performance • TC is limited by the critical path (lw)
Single-Cycle Performance • Single-cycle critical path: • Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup • In most implementations, limiting paths are: • memory, ALU, register file. • Tc = tpcq_PC + 2tmem + tRFread + tALU + tRFsetup + tmux
Single-Cycle Performance Example Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps What’s the max clock frequency?
Single-Cycle Performance Example • For a program with 100 billion instructions executing on a single-cycle MIPS processor, • Execution Time= # instructions x CPI x TC= (100 × 109)(1)(925 × 10-12 s)= 92.5 seconds
Next Time • Next class: • We’ll look at multi-cycle MIPS • Adding functionality to our design • Next lab: • Implement single-cycle CPU!