1 / 17

Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. An unpipelined multicycle architecture. Only one instruction at a time. RFile. pc. PCGen. Exec. WB. dstReg. DCache. ICache. The Multi-stage Design.

cosmo
Download Presentation

Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Revisiting the Processor Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

  2. An unpipelined multicycle architecture Only one instruction at a time RFile pc PCGen Exec WB dstReg DCache ICache

  3. The Multi-stage Design • 3 Rules (one for each stage) • 1 instruction active at a time • 2-3 stages / instruction • Little inter-stage buffering • Multi-cycle memory interface

  4. Rules for multistage/multicycle design rule pcgen(stage == PC); imemReq.enq(Rd{a:pc}); stage <= EX; endrule rule exec(stage == EX); let inst = imemResp.first(); imemResp.deq(); match {.nextpc, .rf_cmd, .mem_cmd} = doInst(pc,inst,rf); pc <= nextpc; case (rf_cmd) matches tagged RF {.dst,.val}: rf.upd(dst,val); endcase case (mem_cmd) matches tagged Ld {.dst,.addr}: begin dmemReq.enq(Rd{a:addr}}; dstReg <= dst; end tagged St {.addr,.val}: dmemReq.enq(Wr{a:addr,v:val}); endcase stage <= (mem_cmd matches tagged Ld .*)? WB: PC; endrule rule writeback (stage == WB); let dresp = dmemResp.first(); dmemResp.deq(); case (dresp) matches tagged RdResp {.val}: rf.upd(dstReg, val); endcase stage <= PC; endrule case (inst) …

  5. Discussion Point s • Can these 3 rules be combined into one rule? • No, memory takes multiple cycles • What will happen if we forgot to dequeue (imemResp)? • System will get stuck • Eventually, PCgen can not longer fire

  6. Pipelining the Design RFile Step 1: Insert buffers pc PCGen Exec WB dstReg DCache ICache

  7. Pipelining the Design 2 It is problematic to write RFile from two stages: structural hazards & possible out-of-order writes(and reads) to RFile RFile pc PCGen Exec WB Step 2: delay all RFile writes to WB stage (Requires passing dst-val pairs to WB stage. Subsumes dstReg) DCache ICache

  8. Pipelining the Design 3 PC is read by PCGen and written by Exec. No parallelism without PC speculation RFile pc PCGen Exec WB epoch If speculation fails, we must reset the PC and discard false path instructions (may take many cycles) DCache ICache Epochs to identify to which speculative path an instruction belongs

  9. Pipelining the Design 4 Final concern: Data hazards RFile pc Exec PCGen WB epoch DCache ICache

  10. Isolating RFile Port Usage rule pcgen(stage == PC); imemReq.enq(Rd{a:pc}); stage <= EX; endrule rule exec(stage == EX); let inst = imemResp.first(); imemResp.deq(); match {.nextpc, .rf_cmd, .mem_cmd} = doInst(pc,inst,rf); pc <= nextpc; case (rf_cmd) matches tagged RF {.dst,.val}: rf.upd(dst,val); endcase case (mem_cmd) matches tagged Ld {.dst,.addr}: begin dmemReq.enq(Rd{a:addr}}; dstReg <= dst; end tagged St {.addr,.val}: dmemReq.enq(Wr{a:addr,v:val}); endcase stage <= (mem_cmd matches tagged Ld .*)? WB: PC; endrule rule writeback (stage == WB); let dresp = dmemResp.first(); dmemResp.deq(); case (dresp) matches tagged RdResp {.val}: rf.upd(dstReg, val); endcase stage <= PC; endrule case (inst) … … Reg2RegOp: …wbData <= RF{dst,val};… LoadOp: …wbData <= Ld{dst};… Suppose we use a wbData register to pass information to the WB stage about updating the RFile rule writeback (stage == WB); case (wbData) matches tagged RF {.dst,.val}: rf.upd(dst,val); tagged Ld {.dst}: begin let dresp = dmemResp.first(); dmemResp.deq(); rf.upd(dst,memVal(dresp)); end endcase stage <= PC; endrule

  11. Rules for the Pipelined machine rule pcgen; imemReq.enq(Rd{a:pc}); pc <= predPC; pcQ.enq(tuple2(pc,epoch)); endrule rule discard (epoch != eEpoch); pcQ.deq(); imemResp.deq(); endrule rule exec ((epoch != eEpoch)&& !(stall(inst,wbQ))); case based on the fetched instruction rule writeback (True); wbQ.deq(); case (wbQ.first()) matches tagged RF {.dst,.val}: rf.upd(dst,val); tagged Ld {.dst}: begin let dresp = dmemResp.first(); dmemResp.deq(); rf.upd(dst, memVal(dresp)); end endcase endrule

  12. The Execute Rule let inst = imemResp.first(); match {.predPC,.eEpoch} = pcQ.first(); rule exec(epoch == eEpoch && !stall(inst, wbQ)); pcQ.deq(); imemResp.deq(); match {.nextPC, .rf_cmd, .mem_cmd} = doInst(pc,inst,rf); if(predPC!=nextPC) begin pc <= nextPC; epoch<= epoch+1; end case (tuple2(rf_cmd, mem_cmd)) matches {tagged RF {.dst,.val}, .*}: wbQ.enq(RF{dst,val}); {.*, tagged Ld {.dst,.addr}}: begin wbQ.enq(Ld{dst,addr}); dmemReq.enq(Rd{a:dst}}; end {.*, tagged St {.addr,.val}}: begin wbQ.enq(St); dmemReq.enq(Wr{a:addr,v:val}); end endcase endrule

  13. Design Flow • Are these rules correct? I.e. do they produce the correct results regardless of the order in which they are executed • test – fix – test – fix …. • Run the design and look at the traces to understand concurrency • traces tell you what is happening at each cycle but not why something is not happening • Does your design permit “concurrent firings”, i.e., multiple instructions in the pipeline • Compiler output can tell you • Can multiple guards be true simultaneously? • Structural conflicts? • Permitted rule orderings within a cycle • You may want to split a rule into multiple rules

  14. Top down concurrency analysis • Determine the concurrent rule firings and rule ordering you want • To hand analysis to determine the required concurrent behavior of methods of submodules. • If this behavior is prohibited by a submodule, create a submodule with the desired behavior • this may require Rwires and ConfigRegs

  15. Branch Prediction • Pipeline has simple speculation rule pcGen (True); pc <= pc + 4; otherActions; endrule Simplest prediction: Always not-taken

  16. Branch Predictors RFile pred pc Exec PCGen WB epoch DCache ICache

  17. Branch Prediction Make prediction interface BranchPredictor; method Addr getNextPC(Addr pc); method Action update (Addr pc, Addr correct_next_pc); endinterface rule pcGen (True); pc <= pred.getNextPC(pc); otherActions; endrule rule execute … if (nextPC != correctPC) pred.update(curPc, nextPC); case (instr) matches … BzTaken: if (mispredicted) … endrule Update predictions

More Related