Perceptron-based Global Confidence Estimation for Value Prediction

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003

Thesis Objectives • To present a viable global confidence estimator using perceptrons • To quantify predictability relationships between instructions • To study the performance of the global confidence estimator when used with common value prediction methods

Presentation Outline • Background: • Data Value Prediction • Confidence Estimation • Predictability Relationships • Perceptrons • Perceptron-based Confidence Estimator • Experimental Results and Conclusions

Value Locality Suppose instruction 1 has been executed several times before: I 1: 5 (A) = 3 (B) + 2 (C) . . . I 1: 6 (A) = 4 (B) + 2 (C) . . . I 1: 7 (A) = 5 (B) + 2 (C) Next time, its outcome A will probably be 8

Data Value Prediction • A data value predictor predicts A from instruction 1’s past outcomes • Instruction 2 speculatively executes using the prediction 1. ADD 7 (A) = 5 (B) + 2 (C) 1. ADD A = 6 (B) + 2 (C) 2. ADD D = (5) E + 8 (A) Predictor: +1

Types of Value Predictors • Computational: Performs a mathematical operation on past values • Last-Value: 5, 5, 5, 5 5 • Stride: 1, 3, 5, 7 9 • Context: Learns repeating sequences of numbers 3, 6, 5, 3, 6, 5, 3 6

Types of Value History • Local History: Predicts using data from past instances of instructions • Global History: Predicts using data from other instructions Local value prediction is more conventional

Are mispredictions a problem? • If a prediction is incorrect, speculatively executed instructions must be re-executed • This can result in: • Cycle penalties for detecting the misprediction • Cycle penalties for restarting dependent instructions • Incorrect resolution of dependent branch instructions It is better to not predict at all than to mispredict

Confidence Estimator • Decides whether to make a prediction for an instruction • Bases decisions on the accuracy of past predictions • Common confidence estimation method: Saturating Up-Down Counter

Up-Down Counter Start Threshold Correct Correct Correct Correct Don’t Predict Don’t Predict Don’t Predict Predict Incorrect Incorrect Incorrect Incorrect

Local vs. Global • Up-Down counter is local • Only past instances of an instruction affect its counter • Global confidence estimation uses the prediction accuracy (“predictability”) of past dynamic instructions • Problem with global: • Not every past instruction affects the predictability of the current instruction

Example I 1. A = B + C I 2. F = G – H I 3. E = A + A • Instruction 3 depends on 1 but not on 2 • Instruction 3’s predictability is related to 1 but not 2 • If instruction 1 is predicted incorrectly, instruction 3 will also be predicted incorrectly

Is global confidence worthwhile? • Fewer mispredictions than local • If an instruction mispredicts, its dependent instructions know not to predict • Less warm-up time than local • Instructions need not be executed several times before accurate confidence decisions can be made

How common are predictability relationships? Simulation study: • How many instructions in a program predict correctly only when a previous instruction predicts correctly? • Which past instructions have the most influence?

Predictability Relationships Over 70% of instructions for Stride and Last-Value and over 90% for Context have the same prediction accuracy as a past instruction 90% of the time!

Predictability Relationships The most recent 10 instructions have the most influence

Global Confidence Estimation A global confidence estimator must: • Identify for each instruction which past instructions have similar predictability • Use their prediction accuracy to decide whether to predict or not predict

Neural Network • Used to iteratively learn unknown functions from examples • Consists of nodes and links • Each link has a numeric weight • Data is fed to input nodes and propagated to output nodes by the links • Desired output used to adjust (“train”) the weights

Perceptron • Perceptrons only have input and output nodes • They are much easier to implement and train than larger neural networks • Can only learn linearly separable functions

Perceptron Computation • Each bit of input data sourced to an input node • Dot product calculated between input data and weights • Output is “1” if dot product exceeds a threshold; otherwise “0”

Perceptron Training • Weights adjusted so that the perceptron output = the desired output for the given input • Error value (ε) = desired value – perceptron output • ε times each input bit added to each weight

Weights • Weights determine the effect of each input on the output • Positive weight: Output varies directly with input bit • Negative weight: Output varies inversely with input bit • Large weight: Input has strong effect on output • Zero weight Input bit has no effect on output

Linear Separability • An input may have a direct influence on the output • An input may instead have an inverse influence on the output • But an input cannot have a direct influence sometimes and an inverse influence at other times

Perceptron Confidence Estimator • Each input node is a past instruction’s prediction outcome: (1 = correct, –1 = incorrect) • The output is the decision to predict: (1 = predict, 0 = don’t predict) • Weights determine past instruction’s predictability influence on the current instruction: • Positive weight: current instruction mispredicts when past instruction mispredicts • Negative weight: current instruction mispredicts when past instruction predicts correctly • Zero weight: past instruction does not affect current

Perceptron Confidence Estimator Example weights: bias weight = –1 I 1: A = B  C weight = 1 I 2: D = E + F weight = 1 I 3: P = Q  R weight = 0 I 4: G = A + D(current instruction) Instruction 4 predicts correctly only when 1 and 2 predict correctly

Confidence Estimator Organization

Perceptron Implementation

Weight Value Distribution Simulation Study: • What are typical perceptron weight values? • How does the type of predictor influence the weight distribution? • What minimum range do the weights need to have?

Weight Value Distribution

Simulation Methodology • Measurements simulated using SimpleScalar 2.0a • SPEC2000 benchmarks: bzip2, gcc, gzip, perlbmk, twolf, vortex • Each benchmark is run for 500 million instructions • Value predictors: Stride, Last-Value, Context • Baseline confidence estimator: 2-bit up-down counter

Simulation Metrics PCORRECT: # of correct predictions PINCORRECT: # of incorrect predictions N: # of cases where no prediction was made

Stride Results Perceptron estimator shows a coverage increase of 8.2% and an accuracy increase of 2.7% over the up-down counter

Last-Value Results Perceptron estimator shows a coverage increase of 10.2% and an accuracy increase of 5.9% over the up-down counter

Context Results Perceptron estimator shows a coverage increase of 6.1% and an accuracy decrease of 2.9% over the up-down counter

Sensitivity to GPH size

Coverage Sensitivity to the Unavailability of Past Instructions

Accuracy Sensitivity to the Unavailability of Past Instructions

Coverage Sensitivity to Weight Range Limitations

Accuracy Sensitivity to Weight Range Limitations

Conclusions • Mispredictions are a problem in data value prediction • Benchmark programs exhibit strong predictability relationships between instructions • Perceptrons enable confidence estimators to exploit these predictability relationships • Perceptron-based confidence estimation tends to show significant improvement over up-down counter confidence estimation

Perceptron-based Global Confidence Estimation for Value Prediction

Perceptron-based Global Confidence Estimation for Value Prediction

Presentation Transcript

Confidence Interval Estimation

Dependence-Based Value Prediction

Estimation and Confidence Intervals

Estimation and Confidence Intervals

Confidence Estimation for Machine Translation

Estimation and Prediction

Value Prediction

Issues in global precipitation estimation for hydrologic prediction

Merging Path, Global and Local Indexing in Perceptron Branch Prediction

Confidence Interval Estimation

Estimation and Confidence Intervals

Confidence Interval Estimation

Estimation and Confidence Intervals

ESTIMATION AND CONFIDENCE INTERVALS

Confidence Interval Estimation

Issues in global precipitation estimation for hydrologic prediction

Estimation and Confidence Intervals

Confidence Interval Estimation

Estimation and Confidence Intervals