330 likes | 442 Views
Predicting Conditional Branches With Fusion-Based Hybrid Predictors. This research was funded by NSF Grant MIP-9702281. The Branch Prediction Problem. PC Compute. Branch resolution. 1 out of 5 instructions is a branch May require many cycles to resolve
E N D
Predicting Conditional Branches With Fusion-Based Hybrid Predictors This research was funded by NSF Grant MIP-9702281
The Branch Prediction Problem PC Compute Branch resolution • 1 out of 5 instructions is a branch • May require many cycles to resolve • P4 has 20 cycle branch resolution pipeline • Future pipeline depths likely to increase [Sprangle02] • Predict branches to keep pipeline full
Bigger Predictors = More Accurate (but bigger predictors = slower) • Larger predictors tend to yield more accurate predictions • Faster cycle times force smaller branch predictors • Overriding predictor couples small, fast predictor with a large, multi-cycle predictor [Jiménez2000] • performs close to ideal large-fast predictor
Hybrid Predictors • Wide variety of branch prediction algorithms available • Hybrid combines more than one “stand-alone” or component predictor [McFarling93]: Meta- Predictor P1 P2 Final Prediction
Multi-Hybrids M1 P1 P2 M2 P3 P4 P1 P2 Pn … … M3 … … Pr. Encoder Final Prediction Final Prediction “Multi-Hybrid” [Evers96] “Quad-Hybrid” [Evers00]
P1 P2 P3 Pn … X X X … Prediction Fusion Our Idea: Prediction Fusion P1 P2 P3 Pn … … Prediction Selection
Early Attempt from ML P2 P8 P7 • Weighted Majority algorithm [LW94] • Better predictors get assigned larger weights • Make final prediction with larger sum • Predictor with largest weight not always correct P3 P6 P5 P1 P4 0.487 0.513 P2, P6 and P7 say “not-taken” P1, P3, P4, P5 and P8 say “taken”
Outline • COLT Predictor • Choosing parameters and components • Performance • Prediction distributions, component choice
COLT Organization P1 P2 P3 Pn … Branch Address Mapping Table Branch History 1 0 1 … 0 … Final Prediction VMT
Pathological Example P1 P2 P3 0 0 0 Actual outcome = 1 (taken)
Example (cont’d) Selection: COLT: P1 P2 P3 P1 P2 P3 VMT 0 0 0 1 1 0 1 0 0 0 Can recognize and remember this pattern Outcome is always wrong 1
MT Select critical delay COLT Lookup Delay time P1 P2 Pn … … 1 0 0 1 1 ... ... . . . . . . Prediction
Design Choices } Determines number of mapping tables • # of branch address bits • # of branch history bits • # of components • Choice of components • gshare, PAs, gskewed, … • History length, PHT size, … } Determines size of individual MT’s
Predictor Components } • Global History • gshare [McFarling93] • Bi-Mode [Lee97] • Enhanced gskewed [Michaud97] • YAGS [Eden98] • Local History • PAs [Yeh94] • pskewed [Evers96] • Other • 2bC (bimodal) [Smith81] • Loop [Chang95] • alloyed Perceptron [Jiménez02] history lengths optimized on test data sets Total of 59 configurations Sizes vary up to 64KB
Huge Search Space • 259 ways to choose components • ways to choose COLT parameters • We use a genetic search gene format: … … bit-k = 0 means don’t include Pk bit-k = 1 means do include Pk VMT Size history length
Methodology • SPEC2000 integer benchmarks • For tuning/optimization: 10M branches from test • For evaluation: 500M branches from train • Skipped first 100M branches • Compiled with cc –arch ev6 –O4 –fast –non_shared • SimpleScalar simulator • sim-safe for trace collection • MASE for ILP simulations
ILP Performance • Simulated CPU: • 6-issue • 20 cycle pipeline • Same functional units, latencies, caches as Intel P4/NetBurst microarchitecture + + 1-cycle 2bC 4-cycle OR alpct 4-cycle OR COLT Ideal 1-cycle COLT
COLT Parameter Sensitivity • Mapping table counter widths • Number of mapping tables • Number of history bits for VMT index
Explaining Choice of Components • Parameter sensitivity results shows GA performed well for the COLT parameters • Why did it choose the component predictors that it did?
Classifying COLT Predictions • We examined the b (32KB) COLT config. • For each mapping table lookup, we examine the neighboring entries: entry 0001 = NT 0010 P1 P2 P3 P4 1 0 0 1 entry 1001 = T 1111 entry 1101 = T 1001
Classifying Predictions (cont’d) gshare (9) gshare (14) PAs (7) alpct (34/10) 32KB COLT: easy: all neighboring entries agree short: only gshare(9) distinguishes long: only gshare(14) distinguishes local: only PAs(7) distinguishes perceptron: only alpct(34/10) distinguishes multi-length: mix of gshare(9), (14) or alpct mixed: both global and local components Classes:
Related Work/Issues • Alloyed history [Skadron00] • Variable path history length [Stark98] • Dynamic history length fitting [Juan98] • Interference reduction [lots…] • COLT handles all of these cases* • Doesn’t support partial update policies
Open Research • Better individual components • Augment with SBI [Manne99], agree [Sprangle97] • Better fusion algorithms • Hybrid fusion/selection algorithms • Other domains (branch confidence prediction, value prediction, memory dependence prediction, instruction criticality prediction, …)
Summary • Fusion is more powerful than selection • Combines multiple sources of information • Branch behavior is very varied • Need long, short, global and local histories, multiple simultaneous lengths and types of history • COLT is one possible fusion-based predictor • Combines multiple types of information • Current “best” purely dynamic predictor*