150 likes | 375 Views
T-BAG: Bootstrap Aggregating the TAGE Predictor. Ibrahim Burak Karsli, Resit Sendag University of Rhode Island. Bootstrap Aggregating. Statistical method introduced by Breiman in 1996 Use ensemble of predictors sub-predictors could be the same or different
E N D
T-BAG: Bootstrap Aggregating the TAGE Predictor Ibrahim Burak Karsli, Resit Sendag University of Rhode Island
Bootstrap Aggregating • Statistical method introduced by Breiman in 1996 • Use ensemble of predictors • sub-predictors could be the same or different • Train each slightly differently and independently • Each predictor trained with resampled (with replacement) data set (bootstrapping) • Aggregate their predictions • The IDEA is: Many weak learners make strong learner • Theoretically proven to perform better than single learner in an ensemble
TAGE Predictor • Winner of CBP3 • State-of-art branch predictor • Many parameters to allow variety
T-BAG: Prediction PC TAGE x32 prediction aggregation
Predictor Aggregation • Bagging in nature uses 10s to 100s of predictors, so we target unlimited track • Submitted predictor uses 32 TAGE predictors • Keep track of successes of last 16 predictions with a sliding window for each predictor • Aggregate the predictions using weighted sum
T-BAG: Update PC & resolveDir Update Count TAGE x32
Random Update • Each predictor is updated on each sample k times in a row where k is a random number generated by multinomial distribution • Max k = 2 (because ctr width is 3bits) • For submission, update on each sample 20%, 60%, 20% of the time, 0, 1, 2 times, respectively.
Sub Predictors • 32 predictors • Variability in min/max history lengths, number of tables, and use of PC in table indexing • ctr 3-bits for all • Each predictor’s size is about 15MB (submitted predictor 492MB) • Min history varies between 3 and 13 • Max history varies between 1,200 and 100,000 • Number of tables varies between 20 and 38 • 16 predictors use PC, the other 16 do NOT! • Use of PC in indexing tables for TAGE-like predictor is not significantly better!
Results • AllSame_RandUpd -> 1.952 misp/KI • AllDifferent -> 1.932 misp/KI • AllDisfferent_RandUpd -> 1.919 misp/KI
Conclusion and Future Work • Simple idea • Different types of predictors • Implementation with storage budget