1 / 23

The I nner M ost L oop I teration counter a new dimension in branch history

Explore the use of the Inner Most Loop Iteration counter (IMLI) to capture correlations between branches in the inner most loop and neighboring iterations. Enhance prediction accuracy with the IMLI-SIC component, an add-on to existing predictors like TAGE-GSC or GEHL.

zelmar
Download Presentation

The I nner M ost L oop I teration counter a new dimension in branch history

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Inner Most Loop Iteration countera new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio

  2. For 25 years, branch predictors exploit: Local history predictors While (..){ If ((X % 3) || (X % 5)) { ..} X++; } Global history predictors If (X< -2) {..} If (X> 1) {..} If (X==0) {..}

  3. In practice, on real hardware, • Just global history predictors • + a loop predictor (sometimes) • local history is not very efficient • CBP4: ~5 % misprediction reduction • a mess to implement

  4. The messy managementof speculative local history Local History Table update at commit time to prediction tables B h3 B h2 Speculative History for the most recent occurrence of branch B B h1 B h4 Several (many) instances of the samebranchinflight: wronghistory wrongprediction Window of inflight branches

  5. State-of-the-art global history predictors • Neural predictors: • Piecewise linear, Hashed perceptron, SNAP, GEHL • TAGE-GSC: • TAGE + a neural predictor TAGE-GSC= (TAGE-SC-L – local hist – loop pred.)

  6. Neural predictors PC+ Glob hist + Prediction = sign

  7. TAGE-GSC PC +Global history (Main) TAGE Predictor Prediction + Confidence Stat. Cor. PPC + Globhist Just a neural predictor: with TAGE prediction as an input

  8. How predictors work • Evers98: BranchB correlatedwith a few pastbranches • Not somanypathsfromcorrelators to B • Try to capture everypath to B Kind of brute force approach

  9. How to identify correlator branches The looppredictordoesitsmoothly for loops Albericio et al 2014 Correlation in multidimensinalloops

  10. Wormhole branch prediction Albericio et al. Micro 2014 • Correlation in multidimensional loops for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } j= Const strongcorrelation j+i=Const same output j= Const weak correlation Correlation with neighboring iterations but in the previous outer iteration

  11. Wormhole predictor:a side predictor • Monitor hard to predict branches: • in a loop with constant iteration number N (use the loop pred.) • Monitor the local history for this branch • Very long local history • Predict with a few bits in the local history (from the previous outer iteration) J-1 J+1 J J-1 Outer iteration i Outer iteration i -1 N

  12. Wormhole predictor + state-of-the-art global history predictor • Capture correlation with a small number of entries • On a few branches • On a few benchmarks • CBP4 traces: 2 benchs / 40 • CBP3 traces: 2 benchs / 40 But quite efficient on those traces

  13. Wormhole predictor:not worth the implementation • Requires a loop predictor • Requires the branch to be executed on each iteration • Unresolved issue of speculative local history management But let us keep the seminal observation

  14. Let us analyze the problem for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } • Correlation to be captured is: • For branches in the inner most loop • With neighboring iterations, but in previous outer iteration(s) • Would be nice to determine the iteration number !!

  15. The Inner Most Loop Iteration counter • Most loops end by a conditional backward branch …B0...B1…..B3……B4….B5…...B6 if backward if taken IMLIcount ++ else IMLIcount =0 Perfectly counts the iteration numbers for the inner most loop

  16. Same Iteration CorrelationIMLI-SIC component for (i=0;i <Nmax; i++) for (j=0; j < Mmax; j++){ if (A[j+i] >0) { ..} if (B[i][j]-B[i-1][j])>0) if (C[j]>0){..} } correlation with Out[..][j] • IMLI-SIC component • A predictor table indexed with IMLIcount and PC • Just added to the neural part of predictor + IMLI SIC

  17. IMLI-SIC component • A simple add-on to TAGE-GSC or GEHL: • Brings higher accuracy than WH • Also captures most of the (small) benefit of the loop predictor • Get rid of the loop predictor !! • Speculative IMLI counter easy to manage !! • Works on different benchmarks than WH !!

  18. What remains from WH ? Branch 1: correlation with Out[i-1][j-1] for (i=0;i <Mmax; i++) for (j=0; j < Nmax; j++){ if (B[i-j])>0) {..} if (A[j]>0){ A[j]= -A[j]; ..} } Branch 2: Correlation Out[i][j]=1-Out[i-1][j] Not the exact correlations but their forms

  19. IMLI-OH component IMLI History PC IMLI OH IMLI SIC + IMLI OH (PC<<6) +IMLI prediction counter PIPE PC Provides Out[i-1][j] and Out[i-1][j-1]

  20. Yes, but IMLI-OH uses local history ? • The targeted branches feature large iteration numbers • Use of effective OH history: • Same (PC,IMLIcount) = already comitted • The others branches don’t suffer: • the beauty of neural predictors Several (many) instances of the samebranchinflight: wronghistory wrongprediction Instances of the branchwithequal IMLI counter wronghistory readwrong IMLI OH entries

  21. Accuracy improvement on TAGE-GSC80 benchmarks CBP3+CBP4 6-7 % misprediction reduction avg

  22. Shrinking the potential benefit of local history • Add local history + loop predictor • Over TAGE-GSC: • 5-6 % misp. reduction • Over TAGE-GSC-IMLI: • 3-4 % misp. reduction • Loop predictor alone? • < 0.5 % misp. reduction

  23. Summary • Fundamental observation by Albericio et al. : • Correlation in multidimensional loops • IMLI-based components for TAGE-based and neural predictors • Simple implementation • Simple management of speculative states • Directly suitable for hardware implementation

More Related