Predicting Conditional Branches With Fusion-Based Hybrid Predictors

Predicting Conditional Branches With Fusion-Based Hybrid Predictors This research was funded by NSF Grant MIP-9702281

The Branch Prediction Problem PC Compute Branch resolution • 1 out of 5 instructions is a branch • May require many cycles to resolve • P4 has 20 cycle branch resolution pipeline • Future pipeline depths likely to increase [Sprangle02] • Predict branches to keep pipeline full

Bigger Predictors = More Accurate (but bigger predictors = slower) • Larger predictors tend to yield more accurate predictions • Faster cycle times force smaller branch predictors • Overriding predictor couples small, fast predictor with a large, multi-cycle predictor [Jiménez2000] • performs close to ideal large-fast predictor

Hybrid Predictors • Wide variety of branch prediction algorithms available • Hybrid combines more than one “stand-alone” or component predictor [McFarling93]: Meta- Predictor P1 P2 Final Prediction

Multi-Hybrids M1 P1 P2 M2 P3 P4 P1 P2 Pn … … M3 … … Pr. Encoder Final Prediction Final Prediction “Multi-Hybrid” [Evers96] “Quad-Hybrid” [Evers00]

P1 P2 P3 Pn … X X X … Prediction Fusion Our Idea: Prediction Fusion P1 P2 P3 Pn … … Prediction Selection

Early Attempt from ML P2 P8 P7 • Weighted Majority algorithm [LW94] • Better predictors get assigned larger weights • Make final prediction with larger sum • Predictor with largest weight not always correct P3 P6 P5 P1 P4 0.487 0.513 P2, P6 and P7 say “not-taken” P1, P3, P4, P5 and P8 say “taken”

Outline • COLT Predictor • Choosing parameters and components • Performance • Prediction distributions, component choice

COLT Organization P1 P2 P3 Pn … Branch Address Mapping Table Branch History 1 0 1 … 0 … Final Prediction VMT

Pathological Example P1 P2 P3 0 0 0 Actual outcome = 1 (taken)

Example (cont’d) Selection: COLT: P1 P2 P3 P1 P2 P3 VMT 0 0 0 1 1 0 1 0 0 0 Can recognize and remember this pattern Outcome is always wrong 1

MT Select critical delay COLT Lookup Delay time P1 P2 Pn … … 1 0 0 1 1 ... ... . . . . . . Prediction

Design Choices } Determines number of mapping tables • # of branch address bits • # of branch history bits • # of components • Choice of components • gshare, PAs, gskewed, … • History length, PHT size, … } Determines size of individual MT’s

Predictor Components } • Global History • gshare [McFarling93] • Bi-Mode [Lee97] • Enhanced gskewed [Michaud97] • YAGS [Eden98] • Local History • PAs [Yeh94] • pskewed [Evers96] • Other • 2bC (bimodal) [Smith81] • Loop [Chang95] • alloyed Perceptron [Jiménez02] history lengths optimized on test data sets Total of 59 configurations Sizes vary up to 64KB

Huge Search Space • 259 ways to choose components •  ways to choose COLT parameters • We use a genetic search gene format: … … bit-k = 0 means don’t include Pk bit-k = 1 means do include Pk VMT Size history length

Methodology • SPEC2000 integer benchmarks • For tuning/optimization: 10M branches from test • For evaluation: 500M branches from train • Skipped first 100M branches • Compiled with cc –arch ev6 –O4 –fast –non_shared • SimpleScalar simulator • sim-safe for trace collection • MASE for ILP simulations

Genetic Search COLT Results

Overall Predictor Performance

Per-Benchmark Performance

ILP Performance • Simulated CPU: • 6-issue • 20 cycle pipeline • Same functional units, latencies, caches as Intel P4/NetBurst microarchitecture + + 1-cycle 2bC 4-cycle OR alpct 4-cycle OR COLT Ideal 1-cycle COLT

ILP Impact

COLT Parameter Sensitivity • Mapping table counter widths • Number of mapping tables • Number of history bits for VMT index

Counter Width

VMT Size

History Length

Explaining Choice of Components • Parameter sensitivity results shows GA performed well for the COLT parameters • Why did it choose the component predictors that it did?

Classifying COLT Predictions • We examined the b (32KB) COLT config. • For each mapping table lookup, we examine the neighboring entries: entry 0001 = NT 0010 P1 P2 P3 P4 1 0 0 1 entry 1001 = T 1111 entry 1101 = T 1001

Classifying Predictions (cont’d) gshare (9) gshare (14) PAs (7) alpct (34/10) 32KB COLT: easy: all neighboring entries agree short: only gshare(9) distinguishes long: only gshare(14) distinguishes local: only PAs(7) distinguishes perceptron: only alpct(34/10) distinguishes multi-length: mix of gshare(9), (14) or alpct mixed: both global and local components Classes:

Prediction Classifications

Related Work/Issues • Alloyed history [Skadron00] • Variable path history length [Stark98] • Dynamic history length fitting [Juan98] • Interference reduction [lots…] • COLT handles all of these cases* • Doesn’t support partial update policies

Open Research • Better individual components • Augment with SBI [Manne99], agree [Sprangle97] • Better fusion algorithms • Hybrid fusion/selection algorithms • Other domains (branch confidence prediction, value prediction, memory dependence prediction, instruction criticality prediction, …)

Summary • Fusion is more powerful than selection • Combines multiple sources of information • Branch behavior is very varied • Need long, short, global and local histories, multiple simultaneous lengths and types of history • COLT is one possible fusion-based predictor • Combines multiple types of information • Current “best” purely dynamic predictor*

Questions?

Predicting Conditional Branches With Fusion-Based Hybrid Predictors

Predicting Conditional Branches With Fusion-Based Hybrid Predictors

Presentation Transcript

Hybrid Conditional Sum/Carry Lookahead Adder.

Branches With Almond Blossom - Artisoo.com

Object Segmentation Based on Multiple Features Fusion and Conditional Random Field

Predicting Bus Arrival Time with Mobile Phone based Participatory Sensing

Laser-Based Inertial Confinement Fusion

Designing Programs with Branches

Experiments with Detector-based Conditional Random Fields in Phonetic Recogntion

Branches

Enhancing Positioning Accuracy through Direct Position Estimators based on Hybrid RSS Data Fusion

Hybrid MHD-Gyrokinetic Simulations for Fusion Reseach

Continue with conditional

Experiments with Detector-based Conditional Random Fields in Phonetic Recogntion

Organic-based Hybrid Nanostructures

Testing Programs with Branches

Kernel based data fusion

The Fission-Fusion Hybrid

A fission-fusion hybrid for waste transmutation

Experiments with Detector-based Conditional Random Fields in Phonetic Recogntion

Branches

A fission-fusion hybrid for waste transmutation