CS 7960-4 Lecture 7

CS 7960-4 Lecture 7 Combining Branch Predictors Scott McFarling WRL Tech. Report TN-36 1993

Bimodal Branch Prediction • Identifies most popular prediction in recent past • Updates happen during commit 1 0 PC 10-bit index 1024 entries 2-bit saturating counters

Results • SPEC’89 programs simulated for 10M instrs • (modern studies use hard-to-predict programs) • A larger predictor reduces contention for counters • Prediction rates saturate at 93.5% (at 2K bytes) • (Fig.3)

Local Predictors • Two-Level predictor: The first level has history, • the second level has saturating counters • History gets updated immediately 0 1 1 1 PC 1 0 10-bit index 16 entries 1024 entries 2-bit saturating counters 4-bit history table

Results • For small predictors, there could be contention • at both levels, resulting in inaccurate predictions • Will also take longer to warm up – after every • context switch • Does very well for large predictors – saturates at • 97.1%

Global Predictors • A single history register – neighboring branches • have correlated results • However, the PC is not used 1 0 1024 entries 10-bit global history 2-bit saturating counters

Do We Need PC? • Note that the global history reveals which branch • is being examined • Hence, it outdoes bimodal predictors when the • transistor budget is large (Fig.7) • Local predictor does better – it is more important • to identify the PC and local history than behavior • of neighboring branches

Gselect • Use a combination of PC and global history • Bimodal and global prediction are special cases • (Fig.9) 1 0 n PC / n+m / / 1024 entries m 5-bit global history 2-bit saturating counters

GShare • Xor-ing 10 history bits and 10 PC bits has more • info than the concatenation of 5 bits of each and • more info than each individual component 01111110 00000001 11100001 01111111

Terminology • GAG: Global history indexes into global array • of saturating counters • PAG: Per-address history indexes into global array • of saturating counters • GAP: Global history indexes into each PC’s private • array of counters (gselect) • PAP: Per-address history indexes into each PC’s • private array of counters

Trade-Offs • Some predictors warm-up faster than others • Some programs benefit from global history, some • from local history • Some programs have branches that interfere • with each other • Note that a 64KB local predictor has fewer • saturating counters than a 64KB bimodal predictor • – the former won’t be better for every program

Combining Predictors • Use an array of saturating counters to pick the • best available predictor for each PC Predictor A 1 0 PC 1024 entries Predictor B 2-bit saturating counters

Results • The combination of local and gshare increases • the prediction accuracy to 98.1% (Fig.16) • For smaller transistor budgets, the combination • of bimodal and gshare is better (gshare is twice • the size to make sure the total is a power of two) • A 1KB combined predictor does as well as a • 16KB gselect predictor

Future Work • Detect conflicts, correlations, and common • predictions through profiling/compiler analysis • Functions that compress information in history • or PC • Pipeline predictions – predict two branches ahead • Hierarchical predictors – get a quick prediction in • a cycle and a more accurate one two cycles later

Next Week’s Paper • “Design Trade-Offs for the Alpha EV8 Conditional • Branch Predictor”, Seznec et al., ISCA’02

Title • Bullet

CS 7960-4 Lecture 7

CS 7960-4 Lecture 7

Presentation Transcript

CS 584 Lecture 4

CS 7960-4 Lecture 20

CS 7960-4 Lecture 24

CS 7960-4 Lecture 8

CS 519: Lecture 4

CS 7960-4 Lecture 5

CS 140L Lecture 7

CS 140L Lecture 4

CS 140L Lecture 4

CS 425 Lecture 4

CS 7960-4 Lecture 23

CS 7960-4 Lecture 2

CS 7960-4 Lecture 17

CS 160: Lecture 4

CS 7960-4 Lecture 10

CS 7960-4 Lecture 20

CS 7960-4 Lecture 4

CS 7960-4 Lecture 20

CS 160: Lecture 7

CS 7960-4 Lecture 14

CS 7960-4 Lecture 18