Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology http://csg.csail.mit.edu/6.S078

NA pred with decode feedback RegRead Fetch Decode Execute Memory Write-back xf df F D R X M W fr dr rr xr mr NextAddressPrediction DirectionPrediction http://csg.csail.mit.edu/6.S078

Direction prediction recipe • Execute • Send redirects on mispredicts (unchanged) • Send direction prediction training • Decode • Check if next address matches direction pred • Send redirect if different (update naPred) • Fetch • Generate prediction • Learn from feedback • Accept redirects from later stages http://csg.csail.mit.edu/6.S078

Epoch management recipe • Execute • On exec epoch mismatch - poison instruction • Otherwise, • On mispredict – change exec epoch and redirect. • Decode • On new exec epoch – update local exec/decode epochs • Otherwise, • On decode epoch mismatch – drop instruction • If not dropped, • On next addrmispredict– change decode epoch and redirect. • Fetch • On exec redirect – update local exec epoch • On decode redirect – if for current exec epoch then update localdecode epoch http://csg.csail.mit.edu/6.S078

Add direction feedback • Feedback needs information for training direction predictor • Execute epoch • Decode epoch • Execute epoch typedefstruct { Bool correct; NaInfonaPredInfo; AddrnextAddr; DirInfodirPredInfo; Bool taken; } Feedback deriving (Bits, Eq); FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback<- mkFIFOF; http://csg.csail.mit.edu/6.S078

Execute (branch analysis) • Note: mayhave been reset in decode • Always send feedback // after executing instruction... letnextEeEpoch = eeEpoch; letcond = execData.execInst.cond; letnextPc= cond?execData.execInst.addr: execData.pc+4; let correctPred = (nextPC == execData.nextAddrPred); if (!correctPred) nextEeEpoch += 1; eeEpoch<= nextEeEpoch; execFeedback.enq(tuple2(nextEeEpoch, Feedback{correct: correctPred, taken: cond, dirPredInfo: execData.dirPredInfo, naPredInfo: execData.naPredInfo, nextAddr: nextPc})); // enqueue instruction to next stage http://csg.csail.mit.edu/6.S078

Decode with mispredict detect • New exec epoch • Same decepoch • Determine if epoch of incoming instruction is on good path ruledoDecode; letdecData = newDecData(fr.first); letcorrectPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); letinstResp = decData.fInst.instResp; letpcPlus4 = decData.pc+4; if(correctPath) begin decData.decInst= decode(instResp, pcPlus4); lettarget = knownTargetAddr(decData.decInst); letbrClass = getBrClass(decData.decInst); letpredTarget = decData.nextAddrPred; letpredDir = decData.dirPred; http://csg.csail.mit.edu/6.S078

Decode with mispredict detect • Calculate target as best as decode can • Wrong next addr? • New dec epoch • Tell exec addr of next instruction! • Send feedback • Enqueue to next stage on correct path let decodedTarget = case (brClass) NonBranch: pcPlus4; UncondKnown: target; CondBranch: (predDir?target:pcPlus4); default:decData.nextAddrPred; endcase; if (decodedTarget!= predTarget) begin decData.decEpoch= decData.decEpoch + 1; decData.nextAddrPred= decodedTarget; decFeedback.enq( tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget, dirPredInfo: decData.dirPredInfo, taken: decData.takenPred})); end dr.enq(decData); end // of correct path http://csg.csail.mit.edu/6.S078

Decode with mispredict detect • Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them. else begin // incorrect path decData.decEpoch= ddEpoch; decData.execEpoch= deEpoch; end ddEpoch<= decData.decEpoch; deEpoch<= decData.execEpoch; fr.deq; endrule http://csg.csail.mit.edu/6.S078

Integration into Fetch rule doFetch(); function Action enqInst(); action let d <- mem.side(MemReq{op: Ld, addr: fetchPC, data:?}; match {.nAddrPred,.naPredInfo}<-naPred.predict(fetchPc); match {.dirPred,.dirPredInfo}<-dirPred.predict(fetchPc); FBundlefInst = FBundle{instResp: d}; FDatafData = FData{pc: fetchPc, fInst: fInst, inum: iNum, execEpoch: feEpoch, naPredInfo:naPredInfo, nextAddrPred:nAddrPred, dirPredInfo:dirPredInfo, dirPred:dirPred }; iNum<= iNum + 1; fetchPc<= nAddrPred; fr.enq(fData); endaction endfunction http://csg.csail.mit.edu/6.S078

Handling redirect from execute Train and repair on redirect Just train on correct prediction if (execFeedback.notEmpty) begin match{.execEpoch, .fb} = execFeedback.first; execFeedback.deq; if(!fb.correct) begin dirPred.repair(fb.dirPredInfo, fb.taken); dirPred.train(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); naPred.train(fb.naPredInfo, fb.nextAddr); feEpoch <= execEpoch; fetchPc<= feedback.nextAddr; endelsebegin dirPred.train(fb.dirPredInfo, fb.taken); naPred.train(fb.naPredInfo, fb.nextAddr); enqInst; end end http://csg.csail.mit.edu/6.S078

Handling redirect from decode Just repair never train on feedback from decode elseif (decFeedback.notEmpty) begin decFeedback.deq; match {.execEpoch, .decEpoch, .fb} = decFeedback.first; if (execEpoch== feEpoch) begin if (!fb.correct) begin// epoch unchanged fdEpoch<= decEpoch; dirPred.repair(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); fetchPc<= feedback.nextAddr; end else// dec feedback on correct prediction enqInst; end else// dec feedback, but fetch is in new exec epoch enqInst; else // no feedback enqInst; http://csg.csail.mit.edu/6.S078

Immediate update issues Note: In the lab code we communicate the branch type of each instruction to allow training and repair to decide if they want to perform updates or not based on instruction type. • If the direction director does not update immediately on predictions things are easy. But if the predictor updates, we will predict and update the predictor on non-branches. • Possible solutions: • Move direction prediction to decode, so we know not to update on non-branches. But makes timing more critical. • Simply use direction predictor even on non-branch instructions. • Note: for superscaler issue designs this is a less significant problem. http://csg.csail.mit.edu/6.S078

Predictor Primitive Index Prediction Depth P Update I U Width • Indexed table holding values • Operations • Predict • Update • Algebraic notation Prediction = P[Width, Depth](Index; Update) http://csg.csail.mit.edu/6.s078

One-bit Predictor Simple temporal prediction 1 bit PC Prediction P I Taken U A21064(PC; T) = P[ 1, 2K ](PC; T) What happens on loop branches? At best, mispredicts twice for every use of loop. http://csg.csail.mit.edu/6.s078

Two-bit Predictor 2 bits PC Prediction P Taken I +/- Adder U Counter[W,D](I; T) = P[W, D](I; if T then P+1 else P-1) A21164(PC; T) = MSB(Counter[2, 2K](PC; T)) http://csg.csail.mit.edu/6.s078

History Register PC History P Taken I Concatenate U History(PC, T) = P(PC; P || T) http://csg.csail.mit.edu/6.s078

Global History 0 Global History Prediction Concat +/- Taken GHist(;T) = MSB(Counter(History(0, T); T)) Ind-Ghist(PC;T) = MSB(Counter(PC || Hist(GHist(;T);T))) Can we take advantage of a pattern at a particular PC? http://csg.csail.mit.edu/6.s078

Local History Local History Prediction PC Concat +/- Taken LHist(PC, T) = MSB(Counter(History(PC; T); T)) Can we take advantage of the global pattern at a particular PC? http://csg.csail.mit.edu/6.s078

Two-level Predictor PC Global History Prediction 0 Concat Concat +/- Taken 2Level(PC, T) = MSB(Counter(History(0; T)||PC; T)) http://csg.csail.mit.edu/6.s078

0 0 Fetch PC k Two-Level Branch Predictor Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits (~95% correct) 2-bit global branch history shift register Shift in Taken/¬Taken results of each branch http://csg.csail.mit.edu/6.s078 Taken/¬Taken?

Gshare Predictor PC Global History Prediction 0 xor Concat +/- Taken 2Level(PC, T) = MSB(Counter(History(0; T) PC; T)) http://csg.csail.mit.edu/6.s078

Choosing Predictors LHist Prediction GHist Chooser Chooser = MSB(P(PC; P + (A==T) - (B==T)) or Chooser = MSB(P(GHist(PC; T); P + (A==T) - (B==T)) http://csg.csail.mit.edu/6.s078

Tournament Branch Predictor(Alpha 21264) Local history table (1,024x10b) Local prediction (1,024x3b) Global Prediction (4,096x2b) Choice Prediction (4,096x2b) PC Prediction Global History (12b) Choice predictor learns whether best to use local or global branch history in predicting next branch Global history is speculatively updated but restored on mispredict Claim 90-100% success on range of applications http://csg.csail.mit.edu/6.s078

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration