200 likes | 381 Views
Computer Architecture: A Constructive Approach Instruction Representation Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. Single-Cycle SMIPS. Register File. PC. Execute. Decode. +4. Data Memory. Inst Memory.
E N D
Computer Architecture: A Constructive Approach Instruction Representation Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology http://csg.csail.mit.edu/SNU
Single-Cycle SMIPS Register File PC Execute Decode +4 Data Memory Inst Memory Datapath is shown only for convenience; it will be derived automatically from the high-level textual description http://csg.csail.mit.edu/SNU
Decoding Instructions: extract fields needed for execution from each instruction decode 31:26 instType 31:26 aluFunc 5:0 instruction 31:26 branchComp Lot of pure combinational logic: will be derived automatically from the high-level description 20:16 rDst 15:11 25:21 rSrc1 20:16 rSrc2 15:0 imm ext 25:0 immValid http://csg.csail.mit.edu/SNU
Decoding Instructions: input-output types decode 31:26 instType IType 31:26 aluFunc 5:0 Func instruction 31:26 branchComp Instr Type DecBundle BrType 20:16 rDst 15:11 Rindex Mux control logic not shown 25:21 rSrc1 Rindex 20:16 rSrc2 Rindex 15:0 imm ext 25:0 Bits#(32) immValid: Bool http://csg.csail.mit.edu/SNU
Type defs typedefenum{RAlu, IALU, Ld, St, …} ITypederiving(Bits, Eq); typedefenum{Eq, Neq, Le, Lt, Ge, Gt, J, JR, N} BrTypederiving(Bits, Eq); typedefenum{Add, Sub, And, Or, Xor, Nor, Slt, Sltu, LShift, RShift, Sra} Funcderiving(Bits, Eq); http://csg.csail.mit.edu/SNU
Instruction grouping • Many instructions have the same implementation except for the ALU function they invoke • We can group such instructions and reduce the amount of code we have to write • Example: R-Type ALU, I-Type ALU, Br Type, Memory Type, … http://csg.csail.mit.edu/SNU
Decoding Instructions functionDecodedInstdecode(Bit#(32) inst, Addr pc); DecodedInstdInst = ?; letopcode = instrBits[ 31 : 26 ];letrs = instrBits[ 25 : 21 ];letrt = instrBits[ 20 : 16 ];let rd = instrBits[ 15 : 11 ];letshamt = instrBits[ 10 : 6 ];letfunct = instrBits[ 5 : 0 ];letimm = instrBits[ 15 : 0 ];let target = instrBits[ 25 : 0 ]; case(instType(opcode)) ... endcase returndInst; endfunction http://csg.csail.mit.edu/SNU
Decoding Instructions:R-Type ALU case (instType(opcode)) … RAlu: begin dInst.instType = Alu; dInst.aluFunc = case (funct) fcADDU: Add fcSUBU: Sub ... endcase; dInst.rDst = rd; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dTnst.immValid = False end http://csg.csail.mit.edu/SNU
Decoding Instructions:I-Type ALU case (instType(opcode)) … IAlu: begin dInst.instType = Alu; dInst.aluFunc = case (opcode) opADDUI: Add ... endcase; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.imm = signedIAlu(opcode) ? signExtend(imm): zeroExtend(imm); dTnst.immValid = True end http://csg.csail.mit.edu/SNU
Decoding Instructions:Load & Store case (instType(opcode)) LW: begin dInst.instType = Ld; dInst.aluFunc = Add; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.imm = signExtned(imm); dTnst.immValid = True end SW: begin dInst.instType = St; dInst.aluFunc = Add; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.imm = signExtned(imm); dTnst.immValid = True end http://csg.csail.mit.edu/SNU
Decoding Instructions:Jump case (instType(opcode)) … J, JAL: begin dInst.instType = opcode==J ? J : Jal; dInst.rDst = 31; dInst.imm = zeroExtend({target, 2’b00}); dTnst.immValid = True end rJump: begin dInst.instType = funct==JR ? Jr : Jalr; dInst.rDst = rd; dInst.rSrc1 = rs; end http://csg.csail.mit.edu/SNU
Decoding Instructions:Branch case (instType(opcode)) … Branch: begin dInst.instType = Br; dInst.branchComp = case (opcode) opBEQ: EQ opBLEZ: LE ... endcase; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.imm = signExtend({imm, 2’b00}); dTnst.immValid = True end http://csg.csail.mit.edu/SNU
Decoding • Not all fields of dInst are defined in each case • We may decide to pass more information from decode to execute for efficiency– in that case the definition of the type of the decoded instruction has to be adjusted accordingly http://csg.csail.mit.edu/SNU
Single-Cycle SMIPS modulemkProc(Proc); Reg#(Addr) pc <- mkRegU; RFilerf <- mkRFile; Memory mem <- mkMemory; rulefetchAndExecute; //fetch letinstResp <- mem.iside(MemReq{op:Ld, addr:pc, data:?}); //decode letdecInst = decode(instResp); Data rVal1 = rf.rd1(decInst.rSrc1); Data rVal2 = rf.rd2(decInst.rSrc2); http://csg.csail.mit.edu/SNU 14
Single-Cycle SMIPS cont //execute letexecInst = exec(decInst, pc, rVal1, rVal2); if(execInst.instType==Ld || execInst.instType==St) execInst.data <- mem.dside( MemReq{op:execInst.instType, addr:execInst.addr, data:execInst.data}); pc <= execInst.brTaken ? execInst.addr : pc + 4; //writeback if(execInst.instType==Alu|| execInst.instType==Ld) rf.wr(execInst.rDst, execInst.data); endrule endmodule; http://csg.csail.mit.edu/SNU
Executing Instructions execute instType rDst decInst either for rf write or St rVal2 data ALU either for memory reference or branch target rVal1 addr Pure combinational logic Branch Address brTaken pc http://csg.csail.mit.edu/SNU 16
Executing Instructions functionExecInstexec(DecodedInstdInst, Addr pc, Data rVal1, Data rVal2); Data aluVal2 = (dInst.immValid)? dInst.imm : rVal2 letaluRes = alu(rVal1, aluVal2, dInst.aluFunc); letbrRes = aluBr(rVal1, aluVal2, dInst.brComp); letbrAddr = brAddrCal(pc, rVal1, dInst.instType, dInst.imm); returnExecInst{ instType: dInst.instType, brTaken: brRes, addr: dInst.instType==(Ld || St) ? aluRes: br.addr, data: dInst.instType==St ? rVal2 : aluRes, rDst: dInst.rDst}; endfunction http://csg.csail.mit.edu/SNU 17
Branch Resolution functionAddress brAddrCal(Address pc, Data val, InstTypeiType, Data imm); lettargetAddr= case (iType) J : {pc[31:26], imm[25:0]} JR : val default: pc + imm endcase; returntargetAddr; endfunction http://csg.csail.mit.edu/SNU 18
Single-Cycle SMIPS Register File PC Execute Decode +4 performance? Data Memory Inst Memory The whole system was described using one rule; lots of big combinational functions http://csg.csail.mit.edu/SNU
Next few lectures • Can we build a faster machine? • What if program and data resided in the same memory • What if the register file did not have adequate number of ports http://csg.csail.mit.edu/SNU