650 likes | 824 Views
Two research studies related to branch prediction and instruction sequencing. André Seznec INRIA/IRISA. Storage Free Confidence Estimator for the TAGE predictor. Why confidence estimation for branch predictors. Energy/performance tradeoffs: Guiding fetch gating or fetch throttling:
E N D
Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA
Why confidence estimation for branch predictors • Energy/performance tradeoffs: • Guiding fetch gating or fetch throttling: • Dynamic speculative structures resizing • Controlling SMT resource allocation through fetch policies • Fetch the “most” useful instructions • Dual Path execution
What is confidence estimation ? • Assert a confidence to a prediction : • Is itlikelythat the predictionis correct ? • Generallydiscriminateonlylow and high confidence predictions: • High confidence: « very likely » to be correct • Low confidence: « not solikely » to be correct
Confidence estimation for branch predictors • 1981, Jim Smith: • weakcounterspredictions are more likely to mispredict • 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters • Increment on correct prediction, reset on misprediction • low confidence < threshold ≤ high confidence • 1998 Enhanced JRS Grunwald et al: • Use the prediction in the index • A few otherproposals: • Self confidence for perceptrons .. Most studiesstill use enhanced JRS confidence estimators
Metrics for confidence estimators(Grunwald et al 1998) • SENS Sensitivity: • Fraction of correct pred. classified as highconf. • PVP Predictive Value of a Positive test • Probability of highconf. to be correct • SPEC, Specificity: • Fraction of mispred. classified as lowconf. • PVN, Predictive Value of a Negative test • Probability of lowconf. to bemispredicted Differentqualities for different usages
The current limits of confidence prediction • Discriminatingbetweenhigh and low confidence isunsufficient: • Whatis the misp. rate on high and low confidence ? • Malik et al: • Use probability for eachcounter value on an enhanced JRS • Enhanced JRS and state-of-the art branchpredictors ? • Eachpredictoritsown confidence estimator
This study Cost-effective confidence estimator for TAGE • No storageoverhead • Discrimate: • Lowconf. pred. : ≈ 30 % misp. rate or more • Medium conf. pred.: 8-15% misp.rate • High conf. pred. : < 1 % misp rate
TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing
TAGEGeometric history length + PPM-like + optimized update policy h[0:L1] pc pc pc h[0:L2] pc h[0:L3] tag tag tag ctr ctr ctr u u u 1 1 1 1 1 1 1 =? =? =? 1 hash hash hash hash hash hash 1 prediction Tagless base predictor
Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred
Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through a single 4-bit counter
A tagged table entry U Tag Ctr • Ctr: 3-bit prediction counter • U: 2-bit useful counter • Was the entry recently useful ? • Tag: partial tag
Confidence by observation on TAGE • Apart the prediction, the predictordelivers: • The provider component and the value of the predictioncounter • High correlationwith the quality of the predictions • The history of mispredictionscanalsobeobserved • burst of mispredictionsmightindicatepredictorwarming or program phase changing
Experimental framework • 20 traces from the CBP-1 and 20 traces from the CBP-2 • 16Kbits TAGE : 5 tables, max hist 80 bits • 64Kbits TAGE : 8 tables, max hist 130 bits • 256Kbits TAGE : 9 tables, max hist 300 bits • Probability of misprediction as a metric of confidence: • Misprediction Per Kilopredictions (MKP)
Bimodal as the provider component • Providesmany (oftenmost) of the predictions: • Allocation of a tagged table entry happens on a misprediction • Generally bimodal prediction = the bias of the branch • 256Kbits TAGE, bimodal= veryaccurateprediction • Oftenlessthan 1 MKP, alwayssignificantlylowerthan the global misprediction rate • 16Kbits TAGE: • Often bimodal= veryaccurateprediction • On demandingapps: bimodal not betterthanaverage
Discriminating the bimodal predictions • Weakcounters: • Systematically more than250 MKP (generally more than 300 MKP) • Can beclassified as low confidence • « Identify » conflicts due to limitedpredictor size: • Wasthere a mispredictionprovided by the bimodal recently (10 last branches) ? • ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits • Can beclassified as medium confidence • The remaining: • High confidence: <10 MKP, generallymuchless
A tagged component as the provider • Discrimate on the values of the prediction counter
Tagged component as provider: a more thorough analysis • Weak, NearlyWeak , NearlySaturated: • For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher • Saturated: • Slightlylowerthan the global misprediction rate of the applications • Veryhigh confidence for predictable applications (< 10 MKP) • Not thathigh confidence for poorlypredictable applications (> 50 MKP) Problem: Saturatedoftenrepresents more than 50 % of the predictions
Intermediate summary • High confidence class: • (Bimodal saturated, no recentmisprediction by bimodal) • Low confidence class: • Bimodal weak and not saturatedtagged • Medium confidence class: • (Bimodal and recentmisprediction by bimodal) • Taggedsaturated: • Depends on applications, predictor size etc • Very large class ..
Tweaking the predictor to improve confidence
How to improve confidence on tagged counter saturated class • Widening the predictioncounter ? • Not that good: • Slightlydecreasedaccuracy • Only marginal improvement on accuracy on saturated class • Modifying the counter update: • Transition to saturated state with a verylowprobability • P=1/128 in ourexperiments • Marginal accuracyloss ( ≈ 0.02 MPKI)
Towards 3 confidence classes • Tagged Saturated is high confidence • Nearly Saturated is enlarged and is medium confidence
Towards 3 confidence classes • Low confidence: • Weak bimodal + Weaktagged + NearlyWeaktagged • Medium confidence: • Bimodal recentlymispredicted + NearlySaturatedtagged • High confidence: • Bimodal saturated + Saturatedtagged
Prediction and misprediction coverage Mispredictionrate Prediction coverage Misprediction coverage
Behavior examples, 64Kbits Mispredictionrate Prediction coverage Misprediction coverage
Predictions Mispredictions low medium high
Summary on confidence estimation • Manystudies on applications of confidence estimations, but a very few on confidence estimators. • Eachpredictorrequires a different confidence estimator • A verycost-effective and efficient confidence estimator for TAGE • Storage free, verylimitedlogic • Discriminatebetween 3 confidence classes: • Medium + lowconf > 90 % of the mispredictions • High conf in the range of 1 % mispredictions or less
SYRANTwith Nathanael Prémillieu « Moderate cost » control independence exploitation
Why ? • Branchpred. accuracyisreaching a plateau: • TAGE 2006, • ? • Trysomethingelse ..
Control flow reconvergence Branch (if) taken path (else) not-taken path Reconvergence point Instruction flow
Exploiting Control flow reconvergence Misprediction ! Can we save some useful work after the the reconvergence point
Control Dependent (CD) To bedetected Shoud be conserved Reconvergence point To invalidate Control Independent Data Independent (CIDI) Control Independent Data Dependent (CIDD)
Difficulties • Not the same renaming scheme on both paths: • How to conserve results ? • Identification of the reconvergence point: • Check against all previously fetched instructions on the wrong path ? • Identification of CIDI and CIDD instructions ?
SYmmetric Resource Allocation on Not-taken and Taken paths Taken path Not-taken path P0 P0 Physical registers (LSQ entries, ROB entries) P1 P1 Branch P2 P2 P3 P3 Unused registers Gap P4 P4 P5 P5 P6 P6 Reconvergence point P7 P7 P8 P8 Insert gaps to reuse same physical registers
Register validity through a tagging process at rename stage at refetch On a misprediction, increment the tag: X to Y X1 R1 X1 X2 R2 X2 T2 Branch Y3 R5 X3 Execution Y4 X4 N4 R21 R6 R7 R7 Y5 X5 N5 Reconvergence R9 X6 Y6 I0 X7 R5 X7 I1 T7 R6,R7 R1,R2 I2 Y8 R6 X8 R5,R21 I3 X9 X9 R7 R5,R1 Predicted path Corrected path
Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1
Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1
Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1
Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1
Conserve tag and validity if • same instruction • same operands including tags X1 R1 X1 X2 R2 X2 T2 Y3 R5 X3 Y4 X4 N4 R6 R21 R7 R7 Y5 X5 N5 R9 X6 Y6 X7 R5 X7 T7 R1,R2 R6,R7 Y8 R6 X8 R5,R21 X9 X9 R7 R5,R1
Reconvergence detection • Precise detection would require checking every PC for each instruction • Use approximate detection • Detect the first branch after reconvergence
Approximate detection of the reconvergence point Shadow Branch List Active Branch List Branch Direction NbR B1 1 T B3 17 NT B2 12 T B4 22 NT B3 17 NT B5 23 T T B4 22 NT B6 29 B5 23 T B7 40 NT T B6 29 40 NT B7 Copy wrongpath on branchmispredictiondetection
ABL SBL B1 1 T B3 17 NT B2 12 NT B4 22 NT B'3 23 T B5 23 T T B'4 27 NT B6 29 B'5 28 NT B7 40 NT T B6 32 Allows to monitor the resource consumption on both paths
WP RP Taken Taken Not-Taken B1 B1 B1 B2 B2 RP2 RP2 RP1 RP1 RP1 RANT Determine the gap Use the gap
Gap size issue • The twopathsmaybeverydifferent: • Waste of resource • Sometimes 100’s of instructions • Differentfilters: • Onlytrywhen gap size islimited • Onlytry if wrongpathwas the longest • Onlytry if branch confidence islow (or medium) • Onlytry if reconvergence point/gap confidence ishigh
Continue execution after branch misprediction resolution • On « normal » superscalar processors: • Killevery instruction after the misprediction • Control independence exploitation: • Let execution continue untilresources are claimed back Phantomexecution
Preliminary performance evaluation • 8-way superscalar, • deep pipeline 20-stage • Very large instruction window • TAGE predictor • SPEC 2006