280 likes | 455 Views
Perceptrons Branch Prediction and its ’ recent developments. Mostly based on the Dynamic Branch Prediction with Perceptrons Daniel A. Jim´enez Calvin Lin By Shugen Li. Introduction.
E N D
Perceptrons Branch Prediction and its’ recent developments Mostly based on the Dynamic Branch Prediction with Perceptrons Daniel A. Jim´enez Calvin Lin By Shugen Li
Introduction • As the new technology development on the deeper pipeline and faster clock cycle, modern computer architectures increasingly rely on speculation to boost instruction-level parallelism. • Machine learning techniques offer the possibility of further improving performance by increasing prediction accuracy.
Introduction (cont’) • Figure 1. A conceptual system model for branch prediction Adapted from I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”,
Introduction (cont’) • we can improve accuracy by replacing these traditional predictor with neural networks, which provide good predictive capabilities • Perceptrons is one of the simplest possible neural networks -easy to understand, simple to implement, and have several attractive properties
Why perceptrons ? • The major benefit of perceptrons is that by examining theirweights, i.e., the correlations that they learn, it is easy to understand the decisions that they make. • many neural networks is difficult or impossible to determine exactly how the neural network is making its decision. • perceptron’s decision-making process is easy to understand as the result of a simple mathematical formula.
Perceptrons Model • Input Xi as the bits of the global branch history shift register • Weight W0-n is the Weights vector • Y is the output of the perceptrons , Y>0 means prediction is taken , otherwise not taken
Perceptrons training • Let branch outcome t be -1 if the branch was not taken, or 1 if it was taken, and let be the threshold, a parameter to the training algorithm used to decide when enough training has been done. These two pages and figures are adapted from F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.
Perceptrons limitation • Only capable of learning linearly separable functions • It means a perceptron can learn the logical AND of two inputs, but not the exclusive-OR
Experimental result • Use Spec2000 interger benchmark and compare with gshare and bi-mode. • Also compare with a hybrid gshare/perceptron predictor. • Its ability to make use of longer history lengths. • Done well when the branch being predicted exhibits linearly separable behavior.
much longer history lengths than traditionaltwo-level schemes
Implementation • Computing the Perceptron Output. • not needed to compute the dot product. • Instead, simply add when the input bit is 1 and subtract (add the two’s complement) when the input bit is -1. • similar to that performed by multiplication circuits, which must find the sum of partial products that are each a function of an integer and a single bit. • Furthermore, only the sign bit of the result is needed to make a prediction, so the other bits of the output can be computed more slowly without having to wait for a prediction.
Implementation (cont’) • Training
Litimations • Delay-huge latency even if simplified method • Low performance on the non linearly separable • Aliasing and Hardware
Recent development (1)Low-power Perceptrons (selective weight) by Kaveh Aasaraai, Amirali Baniasadi • Non-Effective (NE): These weights have a sign opposite to the dot product value sign. We refer to the summation of NEs as NE-SUM. • Semi-Effective (SE): Weights having the sign of the dot product value, but with an absolute value less than NE-SUM. • Highly-Effective (HE): Weights having the same sign as dot product value and a value greater than NESUM.
Recent development (2)The Combined Perceptron Branch PredictorBy Matteo Monchiero Gianluca Palermo • The predictor consists of two concurrent perceptron-like neural networks; one using as inputs branch history information, the other one program counter bits.
Recent development (3)Path-based neural predictionBy Daniel A.Jimennez • On a N-branch Path-Based Neural predictor, the prediction for a branch is initiated N-branch ahead. The predictions for the N next branches are computed in parallel. • A row of N counters is read using the current instruction block address. On blocks featuring a branch, one of the read counters is added to each of the N partial sums. • The delay is the perceptron table read delay followed by a single multiply-add delay. • No consider the table read delay. Also the misprediction penalty.
Recent development (4)Revisiting the perceptron predictorBy A. Seznec • the accuracy of perceptron predictors is further improved with the following extensions: • using pseudo-tag to reduce aliasing impact • skewing perceptron weight tables to improve table utilization, • introducing redundant history to handle linearly inseparable data sets. • The nonlinear redundant history also leads to a more efficient representation, Multiply-Add Contributions (MAC), of perceptron weights • Increasing hardware complexity.
Recent development (5)the O-GEometric History Length branch predictorBy A. Seznec • The GEHL predictor features M distinct predictor tables Ti • The predictor tables store predictions as signed saturated counters. • A single counter C(i) is read on each predictor table Ti.(1< i < M) • The prediction is computed as the sign of the sum S of the M counters C(i). As the first equation. • The prediction is taken when S is positive or nul and not-taken when S is negative.
Recent development(5) Cont’the O-GEometric History Length branch predictorBy A. Seznec • The history lengths used the second equation for computing the indexing functions for tables Ti • The element on all T(i) table is easy to train, similar like in the perceptrons predictor for • Low hardware cost and better latency.
Conclusion • Perceptrons is attractive as using long history lengths without requiring exponential resources. • It’s weakness is the increased computational complexity and following latency and hardware cost. • As the new idea, it can be combined with the tranditional methods to obtain better performance. • There are several methods being developed to reduce the latency and handle the mis-prediction. • Finally this technology will be more practical as the hardware cost go down quickly. • There should be more space for the further development.
Reference • [1] D. Jimenez and C. Lin, “Dynamic branch prediction withperceptrons”, Proc. of the 7th Int. Symp. on High Perf.Comp. Arch (HPCA-7), 2001. • [2] D. Jimenez and C. Lin, “Neural methods for dynamic branch prediction”, ACM Trans. on Computer Systems,2002. • [3] A. Seznec, “Revisiting the perceptron predictor”,Technical Report, IRISA, 2004. • [4] A. Seznec. An optimized 2bcgskew branch predictor. Technical report Irisa, Sep 2003. • [5] G. Loh. The frankenpredictor. In The 1st JILP Championship Branch Prediction Competition (CBP-1), 2004 • [6] K. Aasaraai and A. Baniasadi Low-power Perceptrons • [7] A. Seznec. TheO-GEometric History Length branch predictor • [8] M. Monchiero and G. Palermo The Combined Perceptron Branch Predictor[9] F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, 1962.
Thank You! Question?