1 / 69

Two-level Adaptive Branch Prediction

Two-level Adaptive Branch Prediction . Colin Egan University of Hertfordshire Hatfield U.K. c.egan@herts.ac.uk. Presentation Structure. Two-level Adaptive Branch Prediction Cached Correlated Branch Prediction Neural Branch Prediction Conclusion and Discussion Where next?.

bracha
Download Presentation

Two-level Adaptive Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two-level Adaptive Branch Prediction Colin Egan University of Hertfordshire Hatfield U.K. c.egan@herts.ac.uk

  2. Presentation Structure • Two-level Adaptive Branch Prediction • Cached Correlated Branch Prediction • Neural Branch Prediction • Conclusion and Discussion • Where next?

  3. Two-level Adaptive Branch Prediction Schemes • First level: • History register(s) record the outcome of the last k branches encountered. • A global history register records information from other branches leading to the branch. • Local (Per-Address) history registers record information of specific branches.

  4. Two-level Adaptive Branch Prediction Schemes • Second level: • Is termed the Pattern History Table (PHT). • The PHT consists of at least one array of two-bit up/down saturating counters that provide the prediction.

  5. Two-level Adaptive Branch Prediction Schemes • Global schemes: • Exploit correlation between the outcome of the current branch and neighbouring branches that were executed leading to this branch. • Local schemes: • Exploit correlation between the outcome of the current branch and its past behaviour.

  6. A global PHT branch address global HR k-bit HR BTC br-tag target address status 2k Global Two-level Adaptive Branch Prediction • GAg Predictor Implementation prediction

  7. branch address n bits of branch address A local/set PHT BTC global HR k-bit HR br-tag target address status .. 2k prediction Global Two-level Adaptive Branch Prediction • GAp / GAs Predictor Implementation

  8. A global PHT (Second-level) PC n bits of branch address BTC (BHT first-level) br-tag target address status HRl prediction Local Two-level Adaptive Branch Prediction • PAg

  9. A local or set PHT (Second-level) PC (branch) address n bits of branch address BTC (BHT first-level) br-tag target address status HRl .. prediction Local Two-level Adaptive Branch Prediction • PAs / PAp

  10. Problems with Two-level Adaptive Branch Prediction • Size of PHT • Increases exponentially as a function of HR length. • Use of uninitialised predictors • No tag fields are associated with PHT prediction counters. • Branch interference (aliasing) • In GAg and PAg all branches share a common PHT. • In GAs and PAs each branch is shared by a set of branches.

  11. Cached Correlated Branch Prediction • Minimises the number of initial mispredictions. • Eliminates branch interference. • Is used in a disciplined manner. • Is cost-effective.

  12. Cached Correlated Branch Prediction • Today we are going to look at two types of cached correlated predictors: • A Global Cached Correlated Predictor. • A Local Cached Correlated Predictor. • We have also developed a combined predictor that uses both global and local history information.

  13. Cached Correlated Branch Prediction • The first level history register remains the same as conventional two-level predictors. • Uses a second level Prediction Cache instead of a PHT.

  14. Cached Correlated Branch Prediction • Uses a secondary default predictor (BTC). • Both predictors provide a prediction. • A priority selector chooses the actual prediction.

  15. Cached Correlated Branch Prediction • The Prediction Cache predicts on the past behaviour of the branch with the current history register pattern. • The Default Predictor predicts on the overall past behaviour of the branch.

  16. Prediction Cache • Size is not a function of the history register length. • Size is determined by the number of prediction counters that are actually used. • Requires a tag-field. • Is cost effective as long as the cost of redundant counters removed from a conventional PHT exceeds the cost of the added tags.

  17. BTC PC Global history register Prediction Cache hash br_tag br_trgt pred vld lru br_tag hrg_tag pred vld lru prediction selector prediction A Global Cached Correlated Branch Predictor

  18. A Local Cached Correlated Branch Predictor • Problem • A Local predictor will require two sequential clock access: • One to access the BTC to furnish HRl. • Second to access the Prediction Cache. • Solution • Cache the next prediction for each branch in the BTC. • Only one clock access is therefore needed.

  19. A Local Cached Correlated Branch Predictor PC BTC Prediction Cache hrl hash Default prediction Prediction Cache prediction Correlated hit prediction selector BTC hit actual prediction

  20. Simulations • Stanford Integer Benchmark suite. • These benchmarks are difficult to predict. • Instruction traces were obtained from the Hatfield Superscalar Architecture (HSA).

  21. Global Simulations • A comparative study of misprediction rates • A conventional GAg, a conventional GAs(16) and a conventional GAp. • Against a Global Cached Correlated predictor (1K – 64K).

  22. Global Simulation Results

  23. Global Simulation Results • For conventional global two-level predictors the best average misprediction rate of 9.23% is achieved by a GAs(16) predictor with a history register length of 26. • In general there is little benefit from increasing the history register length beyond 16-bits for GAg and 14-bits for GAs/GAp.

  24. Global Simulation Results • A 32K entry Prediction Cache with a 30-bit history register achieved the best misprediction rate of 5.99% for the global cached correlated predictors. • This represents a 54% reduction over the best misprediction rate achieved by a conventional global two-level predictor.

  25. Global Simulations • We repeated the same simulations without the default predictor.

  26. Global Simulations Results(without default predictor)

  27. Global Simulation Results(without default predictor) • The best misprediction rate is now 9.12%. • The high performance of the cached predictor depends crucially on the provision of the two-stage mechanism.

  28. Local Simulations • A comparative study of misprediction rates. • A conventional PAg, a conventional PAs(16) and a conventional PAp. • Against a local cached correlated predictor (1K – 64K), with and without the default predictor.

  29. Local Simulation Results(with default predictor)

  30. Local Simulation Results(without default predictor)

  31. Local Simulation Results • For conventional local two-level predictors the best average misprediction rate of 7.35% is achieved by a PAp predictor with a history register length of 30. • The best misprediction rate achieved by a local cached correlated predictor (64K HR= 28) is 6.19%. • This is a 19% improvement over the best conventional local two-level predictor. • However, without the default predictor the best misprediction rate achieved is 8.21% (32K HR=12).

  32. Three-Stage Predictor • Since, the high performance of a cached predictor depends crucially on the provision of the two-stage mechanism we were led to the development of the three-stage predictor.

  33. Three-Stage Predictor • Stages • Primary Prediction Cache. • Secondary Prediction Cache. • Default Predictor. • The predictions from the two Prediction Caches are stored in the BTC so that a prediction is furnished in a single clock cycle.

  34. Three-Stage Predictor Simulations • We repeated the same set of simulations. • We varied the Primary Prediction Cache size (1 – 64K). • The Secondary Prediction Cache was always half the size of the Primary Prediction Cache and used exactly half of the history register bits.

  35. Global Three-Stage Predictor Simulation Results

  36. Global Three-Stage Predictor Simulation Results • The global three-stage predictor consistently outperforms the simpler global two-stage predictor. • The best misprediction rate is now 5.57% achieved with a 32K Prediction Cache and a 30-bit HR. • This represents a 7.5% improvement over the best global two-Stage predictor.

  37. Local Three-Stage Predictor Simulation Results

  38. Local Three-Stage Predictor Simulation Results • The local three-stage predictor consistently outperforms the simpler local two-stage predictor. • The best misprediction rate is now 6.00% achieved with a 64K Prediction Cache and a 28-bit HR. • This represents a 3.2% improvement over the best local two-stage predictor.

  39. Conclusion So Far • Conventional PHTs use large amounts of hardware with increasing history register length. • The history register size of a cached correlated predictor does not determine cost. • A Prediction Cache can reduce the hardware cost over a conventional PHT.

  40. Conclusion So Far • Cached correlated predictors provide better prediction accuracy than conventional two-level predictors. • The role of the default predictor in a cached correlated predictor is crucial. • Three-stage predictors consistently record a small but significant improvement over their two-stage counterparts.

  41. Neural Network Branch Prediction • Dynamic branch prediction can be considered to be a specific instance of a general Time Series Prediction. • Two-level Adaptive Branch Prediction is a very specific solution to the branch prediction problem. • An alternative approach is to look at other applications areas and fields for novel solutions to the problem. • At Hatfield, we have examined the application of neural networks to the branch prediction problem.

  42. Neural Network Branch Prediction • Two neural networks are considered: • A Learning Vector Quantisation (LVQ) Network, • A Backpropagation Network. • One of our main research objectives is to use neural networks to identify new correlations that can be exploited by branch predictors. • We also wish to determine whether more accurate branch prediction is possible and to gain a greater understanding of the underlying prediction mechanisms.

  43. Neural Network Branch Prediction • As with Cached Correlated Branch Prediction, we retain the first level of a conventional two-level predictor. • The k-bit pattern of the history register is fed into the network as input. • In fact, we concatenate the 10 lsb of the branch address with the HR as input to the network.

  44. LVQ prediction • The idea of using an LVQ predictor, was to see if respectable prediction rates could be delivered by a simple LVQ network that was dynamically trained after each branch prediction.

  45. LVQ prediction • The LVQ predictor contains two “codebook” vectors: • Vt – is associated with a taken branch. • Vnt – is associated with a not taken branch. • The concatenated PC + HR form a single input vector. • We call this vector X.

  46. LVQ prediction • Modified Hamming distances are then computed between X, Vt and Vnt. • The winning vector, Vw, is the vector with the smallest HD.

  47. LVQ prediction • Vw is used to predict the branch. • If Vt wins then the branch is predicted as taken. • If Vnt wins then the branch is predicted as not taken.

  48. LVQ prediction • LVQ network training • At branch resolution, Vw is adjusted: Vw (t + 1) = Vw (t) +/- a(t)[X(t) - Vw(t)] • To reinforce correct predictions, the vector is incremented whenever a prediction was proved to be correct, and decremented whenever a prediction was proved to be incorrect. • The factor a(t) represents the learning factor and was (usually) set to a small constant of <0.1. • The losing vector remains unchanged.

  49. LVQ prediction • LVQ network training • Training is therefore dynamic. • Training is also adaptive since the “codebook” vectors reflect the outcomes of the most recently encountered branches.

  50. output layer input layer hidden layers prediction PC + HR weights applied to inputs Backpropagation prediction • Prediction information is fed into a backpropagation network.

More Related