1 / 33

Predicting Conditional Branches With Fusion-Based Hybrid Predictors

Predicting Conditional Branches With Fusion-Based Hybrid Predictors. This research was funded by NSF Grant MIP-9702281. The Branch Prediction Problem. PC Compute. Branch resolution. 1 out of 5 instructions is a branch May require many cycles to resolve

cicada
Download Presentation

Predicting Conditional Branches With Fusion-Based Hybrid Predictors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Conditional Branches With Fusion-Based Hybrid Predictors This research was funded by NSF Grant MIP-9702281

  2. The Branch Prediction Problem PC Compute Branch resolution • 1 out of 5 instructions is a branch • May require many cycles to resolve • P4 has 20 cycle branch resolution pipeline • Future pipeline depths likely to increase [Sprangle02] • Predict branches to keep pipeline full

  3. Bigger Predictors = More Accurate (but bigger predictors = slower) • Larger predictors tend to yield more accurate predictions • Faster cycle times force smaller branch predictors • Overriding predictor couples small, fast predictor with a large, multi-cycle predictor [Jiménez2000] • performs close to ideal large-fast predictor

  4. Hybrid Predictors • Wide variety of branch prediction algorithms available • Hybrid combines more than one “stand-alone” or component predictor [McFarling93]: Meta- Predictor P1 P2 Final Prediction

  5. Multi-Hybrids M1 P1 P2 M2 P3 P4 P1 P2 Pn … … M3 … … Pr. Encoder Final Prediction Final Prediction “Multi-Hybrid” [Evers96] “Quad-Hybrid” [Evers00]

  6. P1 P2 P3 Pn … X X X … Prediction Fusion Our Idea: Prediction Fusion P1 P2 P3 Pn … … Prediction Selection

  7. Early Attempt from ML P2 P8 P7 • Weighted Majority algorithm [LW94] • Better predictors get assigned larger weights • Make final prediction with larger sum • Predictor with largest weight not always correct P3 P6 P5 P1 P4 0.487 0.513 P2, P6 and P7 say “not-taken” P1, P3, P4, P5 and P8 say “taken”

  8. Outline • COLT Predictor • Choosing parameters and components • Performance • Prediction distributions, component choice

  9. COLT Organization P1 P2 P3 Pn … Branch Address Mapping Table Branch History 1 0 1 … 0 … Final Prediction VMT

  10. Pathological Example P1 P2 P3 0 0 0 Actual outcome = 1 (taken)

  11. Example (cont’d) Selection: COLT: P1 P2 P3 P1 P2 P3 VMT 0 0 0 1 1 0 1 0 0 0 Can recognize and remember this pattern Outcome is always wrong 1

  12. MT Select critical delay COLT Lookup Delay time P1 P2 Pn … … 1 0 0 1 1 ... ... . . . . . . Prediction

  13. Design Choices } Determines number of mapping tables • # of branch address bits • # of branch history bits • # of components • Choice of components • gshare, PAs, gskewed, … • History length, PHT size, … } Determines size of individual MT’s

  14. Predictor Components } • Global History • gshare [McFarling93] • Bi-Mode [Lee97] • Enhanced gskewed [Michaud97] • YAGS [Eden98] • Local History • PAs [Yeh94] • pskewed [Evers96] • Other • 2bC (bimodal) [Smith81] • Loop [Chang95] • alloyed Perceptron [Jiménez02] history lengths optimized on test data sets Total of 59 configurations Sizes vary up to 64KB

  15. Huge Search Space • 259 ways to choose components •  ways to choose COLT parameters • We use a genetic search gene format: … … bit-k = 0 means don’t include Pk bit-k = 1 means do include Pk VMT Size history length

  16. Methodology • SPEC2000 integer benchmarks • For tuning/optimization: 10M branches from test • For evaluation: 500M branches from train • Skipped first 100M branches • Compiled with cc –arch ev6 –O4 –fast –non_shared • SimpleScalar simulator • sim-safe for trace collection • MASE for ILP simulations

  17. Genetic Search COLT Results

  18. Overall Predictor Performance

  19. Per-Benchmark Performance

  20. ILP Performance • Simulated CPU: • 6-issue • 20 cycle pipeline • Same functional units, latencies, caches as Intel P4/NetBurst microarchitecture + + 1-cycle 2bC 4-cycle OR alpct 4-cycle OR COLT Ideal 1-cycle COLT

  21. ILP Impact

  22. COLT Parameter Sensitivity • Mapping table counter widths • Number of mapping tables • Number of history bits for VMT index

  23. Counter Width

  24. VMT Size

  25. History Length

  26. Explaining Choice of Components • Parameter sensitivity results shows GA performed well for the COLT parameters • Why did it choose the component predictors that it did?

  27. Classifying COLT Predictions • We examined the b (32KB) COLT config. • For each mapping table lookup, we examine the neighboring entries: entry 0001 = NT 0010 P1 P2 P3 P4 1 0 0 1 entry 1001 = T 1111 entry 1101 = T 1001

  28. Classifying Predictions (cont’d) gshare (9) gshare (14) PAs (7) alpct (34/10) 32KB COLT: easy: all neighboring entries agree short: only gshare(9) distinguishes long: only gshare(14) distinguishes local: only PAs(7) distinguishes perceptron: only alpct(34/10) distinguishes multi-length: mix of gshare(9), (14) or alpct mixed: both global and local components Classes:

  29. Prediction Classifications

  30. Related Work/Issues • Alloyed history [Skadron00] • Variable path history length [Stark98] • Dynamic history length fitting [Juan98] • Interference reduction [lots…] • COLT handles all of these cases* • Doesn’t support partial update policies

  31. Open Research • Better individual components • Augment with SBI [Manne99], agree [Sprangle97] • Better fusion algorithms • Hybrid fusion/selection algorithms • Other domains (branch confidence prediction, value prediction, memory dependence prediction, instruction criticality prediction, …)

  32. Summary • Fusion is more powerful than selection • Combines multiple sources of information • Branch behavior is very varied • Need long, short, global and local histories, multiple simultaneous lengths and types of history • COLT is one possible fusion-based predictor • Combines multiple types of information • Current “best” purely dynamic predictor*

  33. Questions?

More Related