160 likes | 182 Views
This paper discusses the combination of bimodal, local, and global branch predictors to improve prediction accuracy. Results show that a combination of local and global predictors achieves a prediction accuracy of 98.1%. The paper also explores future work on detecting conflicts, correlations, and common predictions through profiling and compiler analysis.
E N D
CS 7960-4 Lecture 7 Combining Branch Predictors Scott McFarling WRL Tech. Report TN-36 1993
Bimodal Branch Prediction • Identifies most popular prediction in recent past • Updates happen during commit 1 0 PC 10-bit index 1024 entries 2-bit saturating counters
Results • SPEC’89 programs simulated for 10M instrs • (modern studies use hard-to-predict programs) • A larger predictor reduces contention for counters • Prediction rates saturate at 93.5% (at 2K bytes) • (Fig.3)
Local Predictors • Two-Level predictor: The first level has history, • the second level has saturating counters • History gets updated immediately 0 1 1 1 PC 1 0 10-bit index 16 entries 1024 entries 2-bit saturating counters 4-bit history table
Results • For small predictors, there could be contention • at both levels, resulting in inaccurate predictions • Will also take longer to warm up – after every • context switch • Does very well for large predictors – saturates at • 97.1%
Global Predictors • A single history register – neighboring branches • have correlated results • However, the PC is not used 1 0 1024 entries 10-bit global history 2-bit saturating counters
Do We Need PC? • Note that the global history reveals which branch • is being examined • Hence, it outdoes bimodal predictors when the • transistor budget is large (Fig.7) • Local predictor does better – it is more important • to identify the PC and local history than behavior • of neighboring branches
Gselect • Use a combination of PC and global history • Bimodal and global prediction are special cases • (Fig.9) 1 0 n PC / n+m / / 1024 entries m 5-bit global history 2-bit saturating counters
GShare • Xor-ing 10 history bits and 10 PC bits has more • info than the concatenation of 5 bits of each and • more info than each individual component 01111110 00000001 11100001 01111111
Terminology • GAG: Global history indexes into global array • of saturating counters • PAG: Per-address history indexes into global array • of saturating counters • GAP: Global history indexes into each PC’s private • array of counters (gselect) • PAP: Per-address history indexes into each PC’s • private array of counters
Trade-Offs • Some predictors warm-up faster than others • Some programs benefit from global history, some • from local history • Some programs have branches that interfere • with each other • Note that a 64KB local predictor has fewer • saturating counters than a 64KB bimodal predictor • – the former won’t be better for every program
Combining Predictors • Use an array of saturating counters to pick the • best available predictor for each PC Predictor A 1 0 PC 1024 entries Predictor B 2-bit saturating counters
Results • The combination of local and gshare increases • the prediction accuracy to 98.1% (Fig.16) • For smaller transistor budgets, the combination • of bimodal and gshare is better (gshare is twice • the size to make sure the total is a power of two) • A 1KB combined predictor does as well as a • 16KB gselect predictor
Future Work • Detect conflicts, correlations, and common • predictions through profiling/compiler analysis • Functions that compress information in history • or PC • Pipeline predictions – predict two branches ahead • Hierarchical predictors – get a quick prediction in • a cycle and a more accurate one two cycles later
Next Week’s Paper • “Design Trade-Offs for the Alpha EV8 Conditional • Branch Predictor”, Seznec et al., ISCA’02
Title • Bullet