Storage Free Confidence Estimator for the TAGE predictor

Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA

Why confidence estimation for branch predictors • Energy/performance tradeoffs: • Guiding fetch gating or fetch throttling: • Dynamic speculative structures resizing • Controlling SMT resource allocation through fetch policies • Fetch the “most” useful instructions • Dual Path execution

What is confidence estimation ? • Assert a confidence to a prediction : • Is itlikelythat the predictionis correct ? • Generallydiscriminateonlylow and high confidence predictions: • High confidence: « very likely » to be correct • Low confidence: « not solikely » to be correct

Confidence estimation for branch predictors • 1981, Jim Smith: • weakcounterspredictions are more likely to mispredict • 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters • Increment on correct prediction, reset on misprediction • low confidence < threshold ≤ high confidence • 1998 Enhanced JRS Grunwald et al: • Use the prediction in the index • A few otherproposals: • Self confidence for perceptrons .. Most studiesstill use enhanced JRS confidence estimators

Metrics for confidence estimators(Grunwald et al 1998) • SENS Sensitivity: • Fraction of correct pred. classified as highconf. • PVP Predictive Value of a Positive test • Probability of highconf. to be correct • SPEC, Specificity: • Fraction of mispred. classified as lowconf. • PVN, Predictive Value of a Negative test • Probability of lowconf. to bemispredicted Differentqualities for different usages

The current limits of confidence prediction • Discriminatingbetweenhigh and low confidence isunsufficient: • Whatis the misp. rate on high and low confidence ? • Malik et al: • Use probability for eachcounter value on an enhanced JRS • Enhanced JRS and state-of-the art branchpredictors ? • Eachpredictoritsown confidence estimator

This study Cost-effective confidence estimatorfor TAGE • No storageoverhead • Discrimate: • Lowconf. pred. : ≈ 30 % misp. rate or more • Medium conf. pred.: 8-15% misp.rate • High conf. pred. : < 1 % misp rate

TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing

TAGEGeometric history length + PPM-like + optimized update policy h[0:L1] pc pc pc h[0:L2] pc h[0:L3] tag tag tag ctr ctr ctr u u u 1 1 1 1 1 1 1 =? =? =? 1 hash hash hash hash hash hash 1 prediction Tagless base predictor

Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred

Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through a single 4-bit counter

A tagged table entry U Tag Ctr • Ctr: 3-bit prediction counter • U: 2-bit useful counter • Was the entry recently useful ? • Tag: partial tag

Updating the U counter • If (Altpred ≠ Pred) then • Pred = taken : U= U + 1 • Pred ≠ taken : U = U - 1 • Graceful aging: • Periodic shift of all U counters • implemented through the reset of a single bit

Allocating a new entry on a misprediction • Find a single “useless” entry with a longer history: • Priviledge the smallest possible history • To minimize footprint • But not too much • To avoid ping-pong phenomena • Initialize Ctr as weak and U as zero

Confidence by observation on TAGE • Apart the prediction, the predictordelivers: • The provider component and the value of the predictioncounter • High correlationwith the quality of the predictions • The history of mispredictionscanalsobeobserved • burst of mispredictionsmightindicatepredictorwarming or program phase changing

Experimental framework • 20 traces from the CBP-1 and 20 traces from the CBP-2 • 16Kbits TAGE : 5 tables, max hist 80 bits • 64Kbits TAGE : 8 tables, max hist 130 bits • 256Kbits TAGE : 9 tables, max hist 300 bits • Probability of misprediction as a metric of confidence: • Misprediction Per Kilopredictions (MKP)

Bimodal as the provider component • Providesmany (oftenmost) of the predictions: • Allocation of a tagged table entry happens on a misprediction • Generally bimodal prediction = the bias of the branch • 256Kbits TAGE, bimodal= veryaccurateprediction • Oftenlessthan 1 MKP, alwayssignificantlylowerthan the global misprediction rate • 16Kbits TAGE: • Often bimodal= veryaccurateprediction • On demandingapps: bimodal not betterthanaverage

Discriminating the bimodal predictions • Weakcounters: • Systematically more than250 MKP (generally more than 300 MKP) • Can beclassified as low confidence • « Identify » conflicts due to limitedpredictor size: • Wasthere a mispredictionprovided by the bimodal recently (10 last branches) ? • ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits • Can beclassified as medium confidence • The remaining: • High confidence: <10 MKP, generallymuchless

A tagged component as the provider • Discrimate on the values of the prediction counter

Tagged component as provider: a more thorough analysis • Weak, NearlyWeak , NearlySaturated: • For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher • Saturated: • Slightlylowerthan the global misprediction rate of the applications • Veryhigh confidence for predictable applications (< 10 MKP) • Not thathigh confidence for poorlypredictable applications (> 50 MKP) Problem: Saturatedoftenrepresents more than 50 % of the predictions

Intermediate summary • High confidence class: • (Bimodal saturated, no recentmisprediction by bimodal) • Low confidence class: • Bimodal weak and not saturatedtagged • Medium confidence class: • (Bimodal and recentmisprediction by bimodal) • Taggedsaturated: • Depends on applications, predictor size etc • Very large class ..

Tweaking the predictor to improve confidence

How to improve confidence on tagged counter saturated class • Widening the predictioncounter ? • Not that good: • Slightlydecreasedaccuracy • Only marginal improvement on accuracy on saturated class • Modifying the counter update: • Transition to saturated state with a verylowprobability • P=1/128 in ourexperiments • Marginal accuracyloss ( ≈ 0.02 MPKI)

Towards 3 confidence classes • Tagged Saturated is high confidence • Nearly Saturated is enlarged and is medium confidence

Towards 3 confidence classes • Low confidence: • Weak bimodal + Weaktagged + NearlyWeaktagged • Medium confidence: • Bimodal recentlymispredicted + NearlySaturatedtagged • High confidence: • Bimodal saturated + Saturatedtagged

Prediction and misprediction coverage Mispredictionrate Prediction coverage Misprediction coverage

Behavior examples, 64Kbits Mispredictionrate Prediction coverage Misprediction coverage

Predictions Mispredictions low medium high

Summary • Manystudies on applications of confidence estimations, but a very few on confidence estimators. • Eachpredictorrequires a different confidence estimator • A verycost-effective and efficient confidence estimator for TAGE • Storage free, verylimitedlogic • Discriminatebetween 3 confidence classes: • Medium + lowconf > 90 % of the mispredictions • High conf in the range of 1 % mispredictions or less

The End 

Storage Free Confidence Estimator for the TAGE predictor