Multiperspective Perceptron Predictor

Multiperspective Perceptron Predictor Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University

Branch-Predicting Perceptron • Inputs (x’s) are from branch history • n + 1 small integer weights (w’s) learned by on-line training • Output (y)is dot product of x’s and w’s; predict taken if y ≥ 0 • Training finds correlations between history and outcome • Keep a table of perceptron weights vectors selected by hash of PC

Neural Prediction in Current Processors • We introduced the perceptron predictor [Jiménez & Lin 2001] • I and others improved it considerably through 2011 • Today, Oracle SPARC T4 contains S3 core with • “perceptron branch prediction” • “branch prediction using a simple neural net algorithm” • Their IEEE Micro paper cites our HPCA 2001 paper • You can buy one today • Today, AMD “Bobcat,” “Jaguar” and probably other cores • Have a “neural net logic branch predictor” • You can buy one today

Hashed Perceptron • Introduced by Tarjan and Skadron 2005 • Breaks the 1-1 correspondence between history bits and weights • Basic idea: • Hash segments of branch history into different tables • Sum weights selected by hash functions, apply threshold to predict • Update the weights using perceptron learning

Multiperspective Idea • Rather than just global/local history, use many features • Multiple perspectives on branch history • Multiperspective Perceptron Predictor • Hashed Perceptron • Sum weights indexed by hashes of features • Update weights using perceptron training • Contribution is a wide range of features

Traditional Features • GHIST(a,b) – hash of a to b most recent branch outcomes • PATH(a,b) – hash of recent a PCs, shifted by b • LOCAL – 11-bit local history • I DO ADVOCATE FOR LOCAL HISTORY IN REAL BRANCH PREDICTORS! • GHISTPATH - combination of GHIST and PATH • SGHISTPATH – alternate formulation allowing range • BIAS – bias of the branch to be taken regardless of history

Novel Features • IMLI – from Seznec’s innermost loop iteration counter work: • When a backward branch is taken, count up • When a backward branch is not taken, reset counter • I propose an alternate IMLI • When a forward branch is not taken, count up • When a forward branch is taken, reset counter • This represents loops where the decision to continue is at the top • Typical in code compiled for size or by JIT compilers • Forward IMLI works better than backward IMLI on these traces • I use both forward and backward in the predictor

Novel Features cont. • MODHIST – modulo history • Branch histories become misaligned when some branches are skipped • MODHIST records only branches where PC ≡ 0 (mod n) for some n. • Hopefully branches responsible for misalignment will not be recorded • Try many values of n to come up with a good MODHIST feature

Novel Features cont. • MODPATH – same idea with path of branch PCs • GHISTMODPATH – combine two previous ideas • RECENCY • Keep a recency stack of n branch PCs managed with LRU replacement • Hash the stack to get the feature • RECENCYPOS • Position (0..n-1) of current branch in recency stack, or n if no match • Works surprisingly well

Novel Features cont. • BLURRYPATH • Shift higher-order bits of branch PC into an array • Only record the bits if they don’t match the current bits • Parameters are depth of array, number of bits to truncate • Indicates region a branch came from rather than the precise location

Novel Features cont. • ACYCLIC • Current PC indexes a small array, recording the branch outcome there • The array always has the latest outcome for a given bin of branches • Acyclic – loop or repetition behavior is not recorded • Parameter is number of bits in the array

Putting it Together • Each feature computed, hashed, and XORed with current PC • Resulting index selects weight from a table • Weights are summed, thresholded to make prediction • Weights are updated with perceptron learning

Optimizations • Filter always/never taken branches • Apply sigmoidal transfer function to weights before summing • Coefficients for features to emphasize relative accuracy • Bit width optimization for tables • Shared magnitudes – two signs share one magnitude • Alternate prediction on low confidence (see paper) • Adaptive threshold training • Hashing some tables together with IMLI and RECENCYPOS

Contribution of Features (8KB)

Submit to HPCA 2017! http://hpca2017.org Note: Deadline is August 1, 2016!

Multiperspective Perceptron Predictor

Multiperspective Perceptron Predictor

Presentation Transcript

Perceptron Models

The Perceptron

Structured Perceptron

Rosenblatt's Perceptron

Perceptron

Perceptron

Revisiting the perceptron predictor

Perceptron

Perceptron Learning

MULTILAYER PERCEPTRON

Multilayer Perceptron

Ovulation Predictor

Perceptron Algorithm

Perceptron