230 likes | 240 Views
Explore the use of the Inner Most Loop Iteration counter (IMLI) to capture correlations between branches in the inner most loop and neighboring iterations. Enhance prediction accuracy with the IMLI-SIC component, an add-on to existing predictors like TAGE-GSC or GEHL.
E N D
The Inner Most Loop Iteration countera new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio
For 25 years, branch predictors exploit: Local history predictors While (..){ If ((X % 3) || (X % 5)) { ..} X++; } Global history predictors If (X< -2) {..} If (X> 1) {..} If (X==0) {..}
In practice, on real hardware, • Just global history predictors • + a loop predictor (sometimes) • local history is not very efficient • CBP4: ~5 % misprediction reduction • a mess to implement
The messy managementof speculative local history Local History Table update at commit time to prediction tables B h3 B h2 Speculative History for the most recent occurrence of branch B B h1 B h4 Several (many) instances of the samebranchinflight: wronghistory wrongprediction Window of inflight branches
State-of-the-art global history predictors • Neural predictors: • Piecewise linear, Hashed perceptron, SNAP, GEHL • TAGE-GSC: • TAGE + a neural predictor TAGE-GSC= (TAGE-SC-L – local hist – loop pred.)
Neural predictors PC+ Glob hist + Prediction = sign
TAGE-GSC PC +Global history (Main) TAGE Predictor Prediction + Confidence Stat. Cor. PPC + Globhist Just a neural predictor: with TAGE prediction as an input
How predictors work • Evers98: BranchB correlatedwith a few pastbranches • Not somanypathsfromcorrelators to B • Try to capture everypath to B Kind of brute force approach
How to identify correlator branches The looppredictordoesitsmoothly for loops Albericio et al 2014 Correlation in multidimensinalloops
Wormhole branch prediction Albericio et al. Micro 2014 • Correlation in multidimensional loops for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } j= Const strongcorrelation j+i=Const same output j= Const weak correlation Correlation with neighboring iterations but in the previous outer iteration
Wormhole predictor:a side predictor • Monitor hard to predict branches: • in a loop with constant iteration number N (use the loop pred.) • Monitor the local history for this branch • Very long local history • Predict with a few bits in the local history (from the previous outer iteration) J-1 J+1 J J-1 Outer iteration i Outer iteration i -1 N
Wormhole predictor + state-of-the-art global history predictor • Capture correlation with a small number of entries • On a few branches • On a few benchmarks • CBP4 traces: 2 benchs / 40 • CBP3 traces: 2 benchs / 40 But quite efficient on those traces
Wormhole predictor:not worth the implementation • Requires a loop predictor • Requires the branch to be executed on each iteration • Unresolved issue of speculative local history management But let us keep the seminal observation
Let us analyze the problem for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } • Correlation to be captured is: • For branches in the inner most loop • With neighboring iterations, but in previous outer iteration(s) • Would be nice to determine the iteration number !!
The Inner Most Loop Iteration counter • Most loops end by a conditional backward branch …B0...B1…..B3……B4….B5…...B6 if backward if taken IMLIcount ++ else IMLIcount =0 Perfectly counts the iteration numbers for the inner most loop
Same Iteration CorrelationIMLI-SIC component for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } correlation with Out[..][j] • IMLI-SIC component • A predictor table indexed with IMLIcount and PC • Just added to the neural part of predictor + IMLI SIC
IMLI-SIC component • A simple add-on to TAGE-GSC or GEHL: • Brings higher accuracy than WH • Also captures most of the (small) benefit of the loop predictor • Get rid of the loop predictor !! • Speculative IMLI counter easy to manage !! • Works on different benchmarks than WH !!
What remains from WH ? Branch 1: correlation with Out[i-1][j-1] for (i=0;i <Mmax; i++) for (j=0; j < Nmax; j++){ if (B[i-j])>0) {..} if (A[j]>0){ A[j]= -A[j]; ..} } Branch 2: Correlation Out[i][j]=1-Out[i-1][j] Not the exact correlations but their forms
IMLI-OH component IMLI History PC IMLI OH IMLI SIC + IMLI OH (PC<<6) +IMLI prediction counter PIPE PC Provides Out[i-1][j] and Out[i-1][j-1]
Yes, but IMLI-OH uses local history ? • The targeted branches feature large iteration numbers • Use of effective OH history: • Same (PC,IMLIcount) = already comitted • The others branches don’t suffer: • the beauty of neural predictors Several (many) instances of the samebranchinflight: wronghistory wrongprediction Instances of the branchwithequal IMLI counter wronghistory readwrong IMLI OH entries
Accuracy improvement on TAGE-GSC80 benchmarks CBP3+CBP4 6-7 % misprediction reduction avg
Shrinking the potential benefit of local history • Add local history + loop predictor • Over TAGE-GSC: • 5-6 % misp. reduction • Over TAGE-GSC-IMLI: • 3-4 % misp. reduction • Loop predictor alone? • < 0.5 % misp. reduction
Summary • Fundamental observation by Albericio et al. : • Correlation in multidimensional loops • IMLI-based components for TAGE-based and neural predictors • Simple implementation • Simple management of speculative states • Directly suitable for hardware implementation