Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology http://csg.csail.mit.edu/SNU

Three-Stage SMIPS Register File Epoch stall? PC Execute Decode wbr fr +4 Data Memory Inst Memory The use of magic memories makes this design unrealistic http://csg.csail.mit.edu/SNU

WriteEnable Clock Address MAGIC RAM ReadData WriteData A Simple Memory Model • Reads and writes are always completed in one cycle • a Read can be done any time (i.e. combinational) • If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) In a real DRAM the data will be available several cycles after the address is supplied http://csg.csail.mit.edu/SNU

Big, Slow Memory DRAM Small, Fast Memory SRAM A B CPU RegFile holds frequently used data Memory Hierarchy size: RegFile << SRAM << DRAM why? latency: RegFile << SRAM << DRAM why? bandwidth:on-chip >> off-chip why? On a data access: hit (data Î fast memory)  low latency access miss (data Ï fast memory)  long latency access (DRAM) http://csg.csail.mit.edu/SNU

Plan • What do simple caches look like • Incorporating caches in processor pipleline http://csg.csail.mit.edu/SNU

Data Cache - Interface req memReq missReq mReqQ guard Processor DRAM guard cache resp memResp mRespQ hitQ guard guard interfaceDCache; method Action req(MemReq r); method ActionValue#(MemResp) resp; method ActionValue#(MemReq) memReq; method Action memResp(MemResp r);endinterface http://csg.csail.mit.edu/SNU

Direct-Mapped Cache Block number Block offset Tag Index Offset req address t k b V Tag Data Block 2k lines t = HIT Data Word or Byte Strided at size of cache What is a bad reference pattern? http://csg.csail.mit.edu/SNU

Data Cache – code structure modulemkDCache(DCache); ---state declarations; Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); … rule doMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(MemResp) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod;method Action memResp(MemResp r) … endmethod;endmodule http://csg.csail.mit.edu/SNU

Data Cache state declarations Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); Vector#(Rows, Reg#(Tag)) tagArray <- replicateM(mkRegU); Vector#(Rows, Reg#(Data)) dataArray <- replicateM(mkRegU); FIFOF#(MemReq) mReqQ <- mkUGFIFOF1; FIFOF#(MemResp) mRespQ <- mkUGFIFOF1; PipeReg#(MemReq) hitQ <- mkPipeReg;Reg#(MemReq) missReq <- mkRegU;Reg#(Bit#(2)) status <- mkReg(0); http://csg.csail.mit.edu/SNU

Data Cache processor-side methods method Action req(MemReqreq) if (status==0); Index idx = truncate(req.addr>>2); Tag tag = truncateLSB(req.addr);Bool valid = vArray[idx];BooltagMatch = tagArray[idx]==tag;if(valid && tagMatch && hitQ.notFull) hitQ.enq(req);else beginmissReq <= req; status <= 1; endendmethod method ActionValue#(MemResp) resp if(hitQ.notEmpty && status==0); hitQ.deq; let r = hitQ.first; Index idx = truncate(r.addr>>2);if(r.op==St) dataArray[idx] <= r.data; returndataArray[idx];endmethod http://csg.csail.mit.edu/SNU

Data Cache memory-side methods method ActionValue#(MemReq) memReq if (mReqQ.notEmpty); mReqQ.deq;returnmReqQ.first;endmethod method ActionmemResp(MemResp res) if (mRespQ.notFull); mRespQ.enq(res);endmethod http://csg.csail.mit.edu/SNU

Data CacheRule to process a cache-miss ruledoMiss (status!=0); Index idx = truncate(missReq.addr>>2);if(status==1 && mReqQ.notFull) beginif(vArray[idx]) mReqQ.enq( MemReq{op:St,addr:{tagArray[idx],idx,2'b00}, data:dataArray[idx]}); status <= 2;end if(status==2 && mReqQ.notFull && (!vArray[idx] || mRespQ.notEmpty)) beginif(vArray[idx]) mRespQ.deq;mReqQ.enq(MemReq{op:Ld, addr:missReq.addr, data:?}); status <= 3;end http://csg.csail.mit.edu/SNU

Data CacheRule to process a cache-miss ruledoMiss (status!=0); …if(status==3 && mRespQ.notEmpty && hitQ.notFull) beginlet data = mRespQ.first; mRespQ.deq; Tag tag = truncateLSB(missReq.addr);vArray[idx] <= True;tagArray[idx] <= tag;dataArray[idx] <= data; hitQ.enq(missReq); status <= 0;endendrule http://csg.csail.mit.edu/SNU

Five-Stage SMIPS Register File Epoch stall? PC Execute Decode wbr fr er dr +4 Data Memory Inst Memory In this organization memory can take any amount of time http://csg.csail.mit.edu/SNU

Five-Stage SMIPS state elements modulemkProc(Proc); Reg#(Addr) pc <- mkRegU; Reg#(Bool) epoch <- mkRegU; RFilerf <- mkRFile; Memory mem <- mkTwoPortedMemory; letiMem = mem.iport; letdMem = mem.dport; PipeReg#(FBundle) fr <- mkPipeReg; PipeReg#(DBundle) dr <- mkPipeReg; PipeReg#(EBundle) er <- mkPipeReg; PipeReg#(WBBundle) wbr <- mkPipeReg; ruledoProc; … http://csg.csail.mit.edu/SNU

Five-Stage SMIPSinstruction fetch ruledoProc; BooliAcc = False; if(fr.notFull && iMem.notFull) begin iMem.req(MemReq{op:Ld, addr:pc, data:?}); iAcc = True; fr.enq(FBundle{pc:pc, epoch:epoch}); end enque instruction fetch request if the request can not be enqued then we must remember not to change the pc to pc+4 (iAcc) http://csg.csail.mit.edu/SNU

Five-Stage SMIPS decode if(fr.notEmpty && dr.notFull && iMem.notEmpty) begin letdInst = decode(iMem.resp); dr.enq(DBundle{pc:fr.first.pc, epoch:fr.first.epoch, dInst:dInst}); fr.deq; iMem.deq; end decode the fetched instruction http://csg.csail.mit.edu/SNU

Five-Stage SMIPS code structure for killing wrongly fetched instructions AddrredirPc = ?; BoolredirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin BooleStall = … BoolwbStall = … if(!eStall && !wbStall) begin leteInst = exec(dInst, …); if(memType(eInst.iType)) dMem.req(…); if(eInst.brTaken) beginredirPC… end; er.enq(EBundle{…}; dr.deq; end end else dr.deq; end successful kill http://csg.csail.mit.edu/SNU

Five-Stage SMIPS stall signal AddrredirPc = ?; BoolredirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin letdInst = dr.first.dInst; BooleStall = er.notEmpty && er.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==er.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==er.first.rDst)); BoolwbStall = wbr.notEmpty && wbr.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==wbr.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==wbr.first.rDst)); http://csg.csail.mit.edu/SNU

Five-Stage SMIPS Execute if not stall if(!eStall && !wbStall) begin Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); leteInst = exec(dInst, rVal1, rVal2, dr.first.pc); if(memType(eInst.iType)) dMem.req(MemReq{op:eInst.iType==Ld ? Ld : St, addr:eInst.addr, data:eInst.data}); if(eInst.brTaken) begin redirPC= eInst.addr; redirPCvalid = True; end er.enq(EBundle{iType:eInst.iType, rDst:eInst.rDst, data:eInst.data}); dr.deq; end end else dr.deq; end successful execution http://csg.csail.mit.edu/SNU

Five-Stage SMIPS execute and memory responses to WB if(er.notEmpty && wbr.notFull && (!memType(er.first.iType) || dMem.notEmpty)) begin wbr.enq(WBBundle{iType:er.first.iType, rDst:er.first.rDst, data:er.first.iType==Ld ? dMem.resp : er.first.data}); er.deq; if(dMem.notEmpty) dMem.deq; end http://csg.csail.mit.edu/SNU

Five-Stage SMIPS writeback if(wbr.notEmpty) begin if(regWriteType(wbr.first.iType)) rf.wr(wbr.first.rDst, wbr.first.data); wbr.deq; end pc <= redirPCvalid ? redirPC : iAcc ? pc + 4 : pc; epoch <= redirPCvalid ? !epoch : epoch; endrule endmodule http://csg.csail.mit.edu/SNU

Summary • Lot of room for making errors  verification and testing is essential • Memory systems or dealing with load latencies is a major aspect of computer design • The 5-stage design presented here is different from H&P and is much more realistic next Branch prediction http://csg.csail.mit.edu/SNU

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.