440 likes | 485 Views
TAGE-SC-L Again MTAGE-SC. André Seznec INRIA/IRISA. Where do these predictors come from ?. GEHL: CBP 2004 , ISCA 2005 TAGE: JILP 2006, CBP 2006 Statistical correlation : CBP 2011 Combining more info: Micro 2011, CBP 2014, Micro 2015 O ptimizing everything : CBP 2016
E N D
TAGE-SC-L AgainMTAGE-SC André Seznec INRIA/IRISA
Where do these predictors come from ? • GEHL:CBP 2004 , ISCA 2005 • TAGE:JILP 2006, CBP 2006 • Statisticalcorrelation:CBP 2011 • Combining more info:Micro 2011, CBP 2014, Micro 2015 • Optimizingeverything: CBP 2016 • Unlimited:CBP 2014 CBP 2016
Around 2002 • Introduction of perceptron predictor (Jimenez01) • State-of-the-art : EV8 predictor • Lagging behind perceptron on a few benchmarks • + with EV8-like: • some applications would benefit from 100+ history bits Both able to handle « long » global histories: 30+ branches
CBP 2004 GEOMETRIC HISTORY LENGTH PREDICTOR
A Multiple length global history predictor T0 T1 T2 Σ L(0) T3 L(1) L(2) T4 L(3) L(4) With a limited number of tables
Underlying idea • H and H’ two history vectors equal on N bits, but differ on bit N+1 • e.g. L(1)NL(2) • Branches (A,H) and (A,H’) biased in opposite directions Table T2 should allow to discriminate between (A,H) and (A,H’)
GEometric History Length predictor The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important:L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!
GEHL (CBP 2004) • Neural inspired • Use of 200+ bits of global history • Narrow counters • Dynamic threshold update
TAgged GEometric history length predictor JILP 2006 TAGE
At CBP 2004, only neural predictors apart PPM-like predictor (Michaud 2004) but .. The update policy was poor
TAGE (JILP 2006) • Partial tag match • almost .. • Geometric history length • Very effective update policy
TAGE: Tagged and prediction by the longest history matching entry h[0:L1] pc pc pc h[0:L2] pc h[0:L3] ctr ctr ctr tag tag tag u u u 1 1 1 1 1 1 1 =? =? =? 1 1 prediction Tagless base predictor
Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred
Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through 4-bit counters
A tagged table entry Tag U Ctr • Ctr: 3-bit prediction counter • U: 1 or 2-bit counters • Was the entry recently useful ? • Tag: partial tag
Allocate entries on mispredictions • Allocate entries in longer history length tables • On tables with U unset • Set Ctr to Weak and U to 0 • Limited storage budget: • Allocate 2 entries (when 15 to 20 different history lengths)
Managing the (U)seful counter • Increment when avoids a misprediction • (Pred = taken) & (Altpred ≠ taken) Becomes « useful » • Global decrement when it becomes « difficult » to allocate: • Many possible heuristics (« difficult » ≈ 2/3 of the entries useful) CBP 2016 heuristics: ≈ 0.5 % MPKI
TAGE vs GEHL: • At equal sizes: ≈ 10 % MPKI reduction May vary with individual benchmarks !
Optimizations for CBP2016 • Sharing storage space • Small hist. sharing a bank-interleaved table • Small tag (8 bits) • Long hist. sharing a bank-interleaved table • Longer tag (12 bits) • Partial associativity • 2 banks for medium hist. Lengths ≈ 2 % MPKI reduction
Statistical Corrector (Global history) CBP2011 TAGE + (G)SC
From CBP 2011,«the Statistical Corrector targets » • Branches with poor correlation with history: • Sometimes better predicted by a single wide PC indexed counter than by TAGE • More generally, track cases such that: • « For this (PC, history, prediction), TAGE is likely (>50 %) to mispredict » statistically
TAGE-GSC ( CBP 2011)(was named a posteriori in Micro 2015) ≈3-5% MPKI red. PC +Global history (Main) TAGE Predictor Prediction + Confidence Stat. Cor. PPC + Globhist Just a global hist neural predictor: + tables indexed with PC, TAGE pred. and confidence
Confidence for TAGE (HPCA 2011) • The value of the counter providing the prediction: Saturated = high confidence Intermediate= medium confidence Weak = low confidence
Why does it work • The bias tables indexedwith PC+TAGE outputs: • Correct (most of the time) • High counter value • Dominates, not many updates • Wrong • Othercounterscanbetrained • (Statistical) Correlation (if itexists) canbecaptured
Optimizations for CBP 2016 • Use TAGE confidence for indexing SC ≈ 1 % MPKI red. • On (very) low SC confidence: • May use TAGE prediction (if high conf, ..) ≈ 0.4 % MPKI red.
The beauty of neural predictors Micro 2011, CBP 2014, Micro 2015 TAGE-SC
From Compaq in 1999 OK, I cheated with loops • I learnt: • Use global history • Avoid local history Did manage to submitonly global historyat CBP 2004, 2006 and 2011
Speculative history must be managed !? • Local history: • table of histories (unspeculatively updated) • must maintain a speculative history per inflight branch: • Associative search, etc ?!? • Global history: • Append a bit on asinglehistory register • Use of a circular buffer and just a pointer to speculatively manage the history
Would not have won CBP 2014 without using local history
How to use local histories with TAGE+(G)SC • Add the local history tables in the neural SC • as in the perceptron [Jimenez2002] ≈ 0.9 % MPKI reduction with 2Kbits on the 8KB predictor ≈ 2.5 % MPKI reduction with 28Kbits on the 64KB predictor I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS
The beauty of neural predictors • TAGE-SC: • Just the right framework to test information vectors • Add extra tables: some benefit ! continue to explore
Can add extra components in SC • IMLI-based components Micro2015 • Capture correlation in multidimensional loops • Very disappointing results essentially no benefit on CBP5 traces • Other forms of history: • E.g. only backward branches
+ a loop predictor (just in case) TAGE-SC-L
Loop predictor • Can predictloop exit • for loopswith large iterationnumbers • regularnumber of iterations • Limited storage budget (a few entries) • But marginal benefit I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS
TAGE-SC-L summary for CBP-5 Most of the budget on global hist. correlation: -TAGE with ≈ 1200 br. for 64 KB and ≈ 400 br. for 8KB -optimize the storage sharing -optimize the allocation Track the statistical correlation with a neural component: -use TAGE prediction AND confidence -incorporate other forms of history (even local history if you are trying to win CBP-5)
TAGE-SC-L is still far from the predictability limits MTAGE-SC
poTAGE-SC: the previous champion poTAGE+COLT (Michaud2014) and TAGE-SC-L
poTAGE + COLT (Michaud2014) TAGE predictors a (PC + 5 pred) indexed table Global history Local history 1 Local history 2 COLT selection Local History 3 Frequency Use TAGE concept on other forms of hist.
Unlimited TAGE-SC Statistical Corrector TAGE predictor Global history Bias GEHL RHSP Final choser other GEHL and perceptrons ...
poTAGE-SC TAGE predictors Statistical Corrector Global history Bias GEHL Local history 1 RHSP Local history 2 Final choser COLT selection other GEHL and perceptrons Local History 3 ... Frequency
MTAGE-SC TAGE predictors Statistical Corrector Global history Bias GEHL Local history 1 RHSP Local history 2 Final choser TAGE prediction combiner Local History 3 ... other GEHL and perceptrons Frequency Global backwardhistory
MTAGE-SC TAGE predictors Statistical Corrector Global history Bias GEHL Local history 1 ≈ 5 % MPKI reduction over poTAGE-SC RHSP Local history 2 Final choser other GEHL and perceptrons TAGE prediction combiner Local History 3 ... Frequency Leverages confidence from SC and TAGE pred. combiner Global backwardhistory TAGE prediction combiner: COLT pred + neural combination of outputs pred + confidence Global backward history: to capture long path correlation, but eliminate intermediate branches A few extra history forms: IMLI, ..
Seems that I am not making progress !! • CBP 2006 misp. rate: • 32KB L-TAGE ≈ 1.22 GTL • CBP 2014 misp.rate: • 32KB TAGE-SC-L ≈ 1.40 poTAGE-SC • CBP 2016 misp.rate: • 64KB TAGE-SC-L ≈ 1.55 MTAGE-SC Not the same traces, but ..
Conclusion • TAGE-SC-L fits limited storage sizes: • Most significant optimizations over CBP 2014 • Use of TAGE confidence as index for SC • Sharing and partial associativity • MTAGE-SC: • Predictability limits even (a little bit) further that previously expected