150 likes | 177 Views
Multiperspective Perceptron Predictor. Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University. Branch-Predicting Perceptron. Inputs ( x ’ s) are from branch history n + 1 small integer weights ( w ’ s) learned by on-line training
E N D
Multiperspective Perceptron Predictor Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University
Branch-Predicting Perceptron • Inputs (x’s) are from branch history • n + 1 small integer weights (w’s) learned by on-line training • Output (y)is dot product of x’s and w’s; predict taken if y ≥ 0 • Training finds correlations between history and outcome • Keep a table of perceptron weights vectors selected by hash of PC
Neural Prediction in Current Processors • We introduced the perceptron predictor [Jiménez & Lin 2001] • I and others improved it considerably through 2011 • Today, Oracle SPARC T4 contains S3 core with • “perceptron branch prediction” • “branch prediction using a simple neural net algorithm” • Their IEEE Micro paper cites our HPCA 2001 paper • You can buy one today • Today, AMD “Bobcat,” “Jaguar” and probably other cores • Have a “neural net logic branch predictor” • You can buy one today
Hashed Perceptron • Introduced by Tarjan and Skadron 2005 • Breaks the 1-1 correspondence between history bits and weights • Basic idea: • Hash segments of branch history into different tables • Sum weights selected by hash functions, apply threshold to predict • Update the weights using perceptron learning
Multiperspective Idea • Rather than just global/local history, use many features • Multiple perspectives on branch history • Multiperspective Perceptron Predictor • Hashed Perceptron • Sum weights indexed by hashes of features • Update weights using perceptron training • Contribution is a wide range of features
Traditional Features • GHIST(a,b) – hash of a to b most recent branch outcomes • PATH(a,b) – hash of recent a PCs, shifted by b • LOCAL – 11-bit local history • I DO ADVOCATE FOR LOCAL HISTORY IN REAL BRANCH PREDICTORS! • GHISTPATH - combination of GHIST and PATH • SGHISTPATH – alternate formulation allowing range • BIAS – bias of the branch to be taken regardless of history
Novel Features • IMLI – from Seznec’s innermost loop iteration counter work: • When a backward branch is taken, count up • When a backward branch is not taken, reset counter • I propose an alternate IMLI • When a forward branch is not taken, count up • When a forward branch is taken, reset counter • This represents loops where the decision to continue is at the top • Typical in code compiled for size or by JIT compilers • Forward IMLI works better than backward IMLI on these traces • I use both forward and backward in the predictor
Novel Features cont. • MODHIST – modulo history • Branch histories become misaligned when some branches are skipped • MODHIST records only branches where PC ≡ 0 (mod n) for some n. • Hopefully branches responsible for misalignment will not be recorded • Try many values of n to come up with a good MODHIST feature
Novel Features cont. • MODPATH – same idea with path of branch PCs • GHISTMODPATH – combine two previous ideas • RECENCY • Keep a recency stack of n branch PCs managed with LRU replacement • Hash the stack to get the feature • RECENCYPOS • Position (0..n-1) of current branch in recency stack, or n if no match • Works surprisingly well
Novel Features cont. • BLURRYPATH • Shift higher-order bits of branch PC into an array • Only record the bits if they don’t match the current bits • Parameters are depth of array, number of bits to truncate • Indicates region a branch came from rather than the precise location
Novel Features cont. • ACYCLIC • Current PC indexes a small array, recording the branch outcome there • The array always has the latest outcome for a given bin of branches • Acyclic – loop or repetition behavior is not recorded • Parameter is number of bits in the array
Putting it Together • Each feature computed, hashed, and XORed with current PC • Resulting index selects weight from a table • Weights are summed, thresholded to make prediction • Weights are updated with perceptron learning
Optimizations • Filter always/never taken branches • Apply sigmoidal transfer function to weights before summing • Coefficients for features to emphasize relative accuracy • Bit width optimization for tables • Shared magnitudes – two signs share one magnitude • Alternate prediction on low confidence (see paper) • Adaptive threshold training • Hashing some tables together with IMLI and RECENCYPOS
Submit to HPCA 2017! http://hpca2017.org Note: Deadline is August 1, 2016!