230 likes | 716 Views
Two-Level Adaptive Dynamic Branch Prediction. Jeroen Lichtenauer. Contents. The problem to be solved Solutions Overview Prediction counter Two-Level Adaptive Branch Prediction Summary Conclusion Related Work. Literature.
E N D
Two-Level Adaptive Dynamic Branch Prediction Jeroen Lichtenauer
Contents • The problem to be solved • Solutions Overview • Prediction counter • Two-Level Adaptive Branch Prediction • Summary • Conclusion • Related Work
Literature • A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History.Tse-Yu Yeh and Yale N. Patt.Department of Electrical Engineering and Computer Science.The University of Michigan, 1993Also important: • Target Prediction for Indirect JumpsPo-Yung Chang, Eric Hao and Yale N. Patt.Department of Electrical Engineering and Computer Science.The University of Michigan, 1997
Problem • Increase in issue rate • Increase in pipeline depth • More speculative execution • More penalty for misprediction of branches Successful Branch Prediction becomes more and more important for reducing execution time!!
Solutions overview • Static prediction • During compilation • Dynamic prediction • Hardware, for instance using branch history • A combination • Compiler Synthesized Dynamic Branch Prediction
Prediction Counter • Increase 1 if branch taken • Decrease 1 if branch not taken • Predict taken if counter value larger than half of the range 2 bit counter: 00 & 01 = Predict not taken10 & 11 = Predict taken
Two-Level Adaptive Branch Prediction • First level, execution history register(s):History of the last k branches encountered. • Second level, pattern history table(s): k refers to a place in the pattern history table that contains the prediction based on the outcome of the branches at the last j occurences of k,Predictor is for instance a prediction counter.
1st Level Branch History Register or Table (BHR or BHT) • Global History of all branches kept in a single register of length k. • Per-addressTable that contains a history register of length k for each branch. • Per SetTable that contains a history register of length k for each set of branches.
2nd levelPattern History Table (PHT) • global Table with a prediction for all possible values of k. • per-addressTable with a prediction for all possible values of k for each branch address. • Per setTable with a prediction for all possible values of k for each branch set.
Interpretation of results • Pattern tables(PHTs) always best per-set (*As) or global (*Ag). *Ap is useless. • Global History schemes(GAs) perform best on integer programs, but only at high cost. • Per-address History schemes (PAs) perform better on floating point programs, even at low cost. • Per-set History schemes (SAs) can reach best overall performance, but have the highest cost so not cost-effective.
Summary • Branch prediction is a very important factor in reducing CPI in modern processors that use extensive pipelining. • A counter is often used for prediction (2 bit) • Two-Level Adaptive Dynamic Branch Prediction ‘learns’ the outcome of branches in different program states. • 9 Variations of 2-L.A.B.P. (Global, Per-Address and Per-Set for both levels), but only 4 useful.
Conclusion • Two-level Adaptive Branch Prediction is very effective, up to +97% average accuracy. • BHR length has more influence on precision then number of PHTs. • Pattern tables(PHTs) always per-set (*As) or even just 1 global table (*Ag). • At low cost PAs is best , at high cost GAs is best
Related Work • Compiler Synthesised Dynamic Branch PredictionScott Mahlke and Balas NatarajanHewlett-Peckard Laboratories, Palo Alto, 1996Uses not only branch history but also other info, for instance the contents of the architectural registers.This does not provide significant better prediction results then 2-Level Branch Predictors. • The Effect of Process Switches on Branch Prediction AccuracyT. Kisuki, H. Corporaal and P.M.W. KnijnenburgDepartment of Computer Science, Leiden UniversityDepartment of Electrical Engineering, Delft University of Technology1999The effect of ‘cold starts’ after process switches is only significant if the process switches occur more than once every 100K clock cycles.