The I nner M ost L oop I teration counter a new dimension in branch history

The Inner Most Loop Iteration countera new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio

For 25 years, branch predictors exploit: Local history predictors While (..){ If ((X % 3) || (X % 5)) { ..} X++; } Global history predictors If (X< -2) {..} If (X> 1) {..} If (X==0) {..}

In practice, on real hardware, • Just global history predictors • + a loop predictor (sometimes) • local history is not very efficient • CBP4: ~5 % misprediction reduction • a mess to implement

The messy managementof speculative local history Local History Table update at commit time to prediction tables B h3 B h2 Speculative History for the most recent occurrence of branch B B h1 B h4 Several (many) instances of the samebranchinflight: wronghistory wrongprediction Window of inflight branches

State-of-the-art global history predictors • Neural predictors: • Piecewise linear, Hashed perceptron, SNAP, GEHL • TAGE-GSC: • TAGE + a neural predictor TAGE-GSC= (TAGE-SC-L – local hist – loop pred.)

Neural predictors PC+ Glob hist + Prediction = sign

TAGE-GSC PC +Global history (Main) TAGE Predictor Prediction + Confidence Stat. Cor. PPC + Globhist Just a neural predictor: with TAGE prediction as an input

How predictors work • Evers98: BranchB correlatedwith a few pastbranches • Not somanypathsfromcorrelators to B • Try to capture everypath to B Kind of brute force approach

How to identify correlator branches The looppredictordoesitsmoothly for loops Albericio et al 2014 Correlation in multidimensinalloops

Wormhole branch prediction Albericio et al. Micro 2014 • Correlation in multidimensional loops for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } j= Const strongcorrelation j+i=Const same output j= Const weak correlation Correlation with neighboring iterations but in the previous outer iteration

Wormhole predictor:a side predictor • Monitor hard to predict branches: • in a loop with constant iteration number N (use the loop pred.) • Monitor the local history for this branch • Very long local history • Predict with a few bits in the local history (from the previous outer iteration) J-1 J+1 J J-1 Outer iteration i Outer iteration i -1 N

Wormhole predictor + state-of-the-art global history predictor • Capture correlation with a small number of entries • On a few branches • On a few benchmarks • CBP4 traces: 2 benchs / 40 • CBP3 traces: 2 benchs / 40 But quite efficient on those traces

Wormhole predictor:not worth the implementation • Requires a loop predictor • Requires the branch to be executed on each iteration • Unresolved issue of speculative local history management But let us keep the seminal observation

Let us analyze the problem for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } • Correlation to be captured is: • For branches in the inner most loop • With neighboring iterations, but in previous outer iteration(s) • Would be nice to determine the iteration number !!

The Inner Most Loop Iteration counter • Most loops end by a conditional backward branch …B0...B1…..B3……B4….B5…...B6 if backward if taken IMLIcount ++ else IMLIcount =0 Perfectly counts the iteration numbers for the inner most loop

Same Iteration CorrelationIMLI-SIC component for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } correlation with Out[..][j] • IMLI-SIC component • A predictor table indexed with IMLIcount and PC • Just added to the neural part of predictor + IMLI SIC

IMLI-SIC component • A simple add-on to TAGE-GSC or GEHL: • Brings higher accuracy than WH • Also captures most of the (small) benefit of the loop predictor • Get rid of the loop predictor !! • Speculative IMLI counter easy to manage !! • Works on different benchmarks than WH !!

What remains from WH ? Branch 1: correlation with Out[i-1][j-1] for (i=0;i <Mmax; i++) for (j=0; j < Nmax; j++){ if (B[i-j])>0) {..} if (A[j]>0){ A[j]= -A[j]; ..} } Branch 2: Correlation Out[i][j]=1-Out[i-1][j] Not the exact correlations but their forms

IMLI-OH component IMLI History PC IMLI OH IMLI SIC + IMLI OH (PC<<6) +IMLI prediction counter PIPE PC Provides Out[i-1][j] and Out[i-1][j-1]

Yes, but IMLI-OH uses local history ? • The targeted branches feature large iteration numbers • Use of effective OH history: • Same (PC,IMLIcount) = already comitted • The others branches don’t suffer: • the beauty of neural predictors Several (many) instances of the samebranchinflight: wronghistory wrongprediction Instances of the branchwithequal IMLI counter wronghistory readwrong IMLI OH entries

Accuracy improvement on TAGE-GSC80 benchmarks CBP3+CBP4 6-7 % misprediction reduction avg

Shrinking the potential benefit of local history • Add local history + loop predictor • Over TAGE-GSC: • 5-6 % misp. reduction • Over TAGE-GSC-IMLI: • 3-4 % misp. reduction • Loop predictor alone? • < 0.5 % misp. reduction

Summary • Fundamental observation by Albericio et al. : • Correlation in multidimensional loops • IMLI-based components for TAGE-based and neural predictors • Simple implementation • Simple management of speculative states • Directly suitable for hardware implementation

The I nner M ost L oop I teration counter a new dimension in branch history

The I nner M ost L oop I teration counter a new dimension in branch history

Presentation Transcript

M E L I A

W i l l i a m B l a k e

M a l a w i

M i M L

The I nner Life of the German L anguage

M E L I A

G A L L I U M

I M P E R I A L I S M

I S L A M

L i a m

I M P E R I A L I S M