1 / 23

Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer

Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. MemWrite. WBSrc. 0x4. clk. we. clk. rs1. rs2. ALU Control. we. rd1. addr. PC. ws. addr. inst. wd.

holland
Download Presentation

Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology http://csg.csail.mit.edu/6.S078

  2. MemWrite WBSrc 0x4 clk we clk rs1 rs2 ALU Control we rd1 addr PC ws addr inst wd rd2 ALU GPRs z Inst. Memory rdata clk Data Memory Imm Ext wdata zero? OpCode RegDst ExtSel OpSel BSrc old way Harvard-Style Datapath for MIPS PCSrc RegWrite br rind jabs pc+4 Add Add 31 http://csg.csail.mit.edu/6.S078

  3. no yes ALU pc+4 Imm Op no yes ALU rt pc+4 sExt16 Imm + yes no * * pc+4 no no * * * * no no * * * * * no * * * * no no * * sExt16 * 0? no no * * yes PC rind R31 * * * no old way Hardwired Control Table * Reg Func no yes ALU rd pc+4 sExt16 Imm Op rt uExt16 sExt16 Imm + no yes Mem rt pc+4 sExt16 * 0? br pc+4 jabs yes PC R31 jabs rind BSrc = Reg / Imm WBSrc = ALU / Mem / PC RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs http://csg.csail.mit.edu/6.S078

  4. Single-Cycle SMIPS new way 2 read & 1 write ports Register File PC Execute Decode +4 separate Instruction & Data memories Data Memory Inst Memory Datapath and control were derived automatically from a high-level rule-based description http://csg.csail.mit.edu/6.S078

  5. Single-Cycle SMIPS code structure module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport ; let dMem = mem.dport; rule doProc; let inst = iMem(MemReq{op:Ld, addr:pc, data:?}); let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, pc); update rf, pc and dMem http://csg.csail.mit.edu/6.S078

  6. Decoding Instructions: input-output types decode 31:26, 5:0 iType IType 31:26 aluFunc 5:0 AluFunc instruction 31:26 brComp Bit#(32) BrType Type DecodedInst 20:16 rDst 15:11 Rindex Mux control logic not shown 25:21 rSrc1 Rindex 20:16 rSrc2 Rindex 15:0 imm Bit#(32) ext 25:0 Bool immValid http://csg.csail.mit.edu/6.S078

  7. Reading Registers Read registers RF RVal1 RSrc1 RSrc2 RVal2 Pure combinational logic http://csg.csail.mit.edu/6.S078

  8. Executing Instructions execute iType rDst dInst either for rf write or St rVal2 data ALU either for memory reference or branch target rVal1 ALUBr addr Pure combinational logic Branch Address brTaken pc http://csg.csail.mit.edu/6.S078

  9. Branch Address Calculation function Addr brAddrCalc(Address pc, Data val, IType iType, Data imm); let targetAddr = case (iType) J, Jal : {pc[31:28], imm[27:0]}; Jr, Jalr : val; default : pc + imm; endcase; return targetAddr; endfunction http://csg.csail.mit.edu/6.S078

  10. Some Useful Functions function Bool memType (IType i) return (i==Ld || i == St); endfunction function Bool regWriteType (IType i) return (i==Alu || i==Ld || i==Jal || i==Jalr); endfunction function Bool controlType (IType i) return (i==J || i==Jr || i==Jal || i==Jalr || i==Br); endfunction http://csg.csail.mit.edu/6.S078

  11. Execute Function function ExecInst exec(DecodedInst dInst, Data rVal1, Data rVal2, Addr pc); ExecInst einst = ?; Data aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.rDst = dInst.rDst; return einst; endfunction http://csg.csail.mit.edu/6.S078

  12. Single-Cycle SMIPS atomic state updates if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data}); if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; endrule endmodule http://csg.csail.mit.edu/6.S078

  13. Single-Cycle SMIPS: Clock Speed Register File PC Execute Decode +4 tClock > tM + tDEC + tRF + tALU+ tM+ tWB Data Memory Inst Memory We can improve the clock speed if we execute each instruction in two clock cycles tClock > max {tM , (tDEC +tRF + tALU+ tM+ tWB)} http://csg.csail.mit.edu/6.S078

  14. Two-Cycle SMIPS Register File stage PC Execute Decode ir +4 Data Memory Inst Memory Introduce register “ir” to hold a fetched instruction and register “stage” to remember which stage (fetch/execute) we are in http://csg.csail.mit.edu/6.S078

  15. ir: The instruction register • You may recall from our earlier discussion of pipelining that when we take multiple cycles to perform some operation (e.g., IFFT), there is a possibility that intermediate registers do not contain any meaningful data in some cycles • It is straight forward to convert ir into a pipeline register • We can associate (Valid/Invalid) bit with ir • Equivalently, we can think of ir as a single-element FIFO http://csg.csail.mit.edu/6.S078

  16. Additional Types typedef struct { Addr pc; Bit#(32) inst; } TypeFetch2Decode deriving (Bits, Eq); typedef enum {Fetch, Execute} TypeStage deriving (Bits, Eq); March 5, 2012 http://csg.csail.mit.edu/6.S078 L8-16

  17. Two-Cycle SMIPS module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport; Reg#(TypeFetch2Decode) ir <- mkRegU; Reg#(TypeStage) stage <- mkReg(Fetch); rule doFetch (state==Fetch); let inst = iMem(MemReq{op:Ld, addr:pc, data:?}); ir <= TypeFetch2Decode{pc: pc, inst: inst}; stage <= Execute; endrule http://csg.csail.mit.edu/6.S078

  18. Two-Cycle SMIPS rule doExecute(stage==Execute); let irpc = ir.pc; let inst = ir.inst; let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, irpc); if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data}); if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; stage <= Fetch; endrule endmodule no change from single-cycle http://csg.csail.mit.edu/6.S078

  19. Princeton versus Harvard Architecture • Harvard architecture uses different memories for instructions and data • needed for a single-cycle implementation • Princeton architecture uses the same memory for instruction and data and thus, requires at least two cycles to execute Load/Store instructions The two-cycle implementations of Princeton and Harvard architectures are almost the same http://csg.csail.mit.edu/6.S078

  20. SMIPS Princeton Architecture Register File stage PC Execute Decode ir +4 Since both the Fetch and Execute stages want to use the memory, there is a structural hazard in accessing memory Memory http://csg.csail.mit.edu/6.S078

  21. Two-Cycle SMIPS Princeton module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkOnePortedMemory; let uMem = mem.port; Reg#(TypeFetch2Decode) ir <- mkRegU; Reg#(TypeStage) stage <- mkReg(Fetch); rule doFetch (stage==Fetch); let inst <- uMem(MemReq{op:Ld, addr:pc, data:?}); ir <= TypeFetch2Decode{pc: pc, inst: inst}; stage <= Execute; endrule http://csg.csail.mit.edu/6.S078

  22. Two-Cycle SMIPS Princeton rule doExecute(stage==Execute); let irpc = ir.pc; let inst = ir.inst; let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, irpc); if(memType(eInst.iType)) eInst.data <- uMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data}); if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; stage <= Fetch; endrule endmodule http://csg.csail.mit.edu/6.S078

  23. Two-Cycle SMIPS: Analysis Register File Fetch Execute stage PC Execute Decode ir +4 Data Memory Inst Memory In any given clock cycle, lots of unused hardware! next lecture: Pipelining to increase the throughput http://csg.csail.mit.edu/6.S078

More Related