400 likes | 521 Views
Perceptron-based Global Confidence Estimation for Value Prediction. Master’s Thesis Michael Black June 26, 2003. Thesis Objectives. To present a viable global confidence estimator using perceptrons To quantify predictability relationships between instructions
E N D
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003
Thesis Objectives • To present a viable global confidence estimator using perceptrons • To quantify predictability relationships between instructions • To study the performance of the global confidence estimator when used with common value prediction methods
Presentation Outline • Background: • Data Value Prediction • Confidence Estimation • Predictability Relationships • Perceptrons • Perceptron-based Confidence Estimator • Experimental Results and Conclusions
Value Locality Suppose instruction 1 has been executed several times before: I 1: 5 (A) = 3 (B) + 2 (C) . . . I 1: 6 (A) = 4 (B) + 2 (C) . . . I 1: 7 (A) = 5 (B) + 2 (C) Next time, its outcome A will probably be 8
Data Value Prediction • A data value predictor predicts A from instruction 1’s past outcomes • Instruction 2 speculatively executes using the prediction 1. ADD 7 (A) = 5 (B) + 2 (C) 1. ADD A = 6 (B) + 2 (C) 2. ADD D = (5) E + 8 (A) Predictor: +1
Types of Value Predictors • Computational: Performs a mathematical operation on past values • Last-Value: 5, 5, 5, 5 5 • Stride: 1, 3, 5, 7 9 • Context: Learns repeating sequences of numbers 3, 6, 5, 3, 6, 5, 3 6
Types of Value History • Local History: Predicts using data from past instances of instructions • Global History: Predicts using data from other instructions Local value prediction is more conventional
Are mispredictions a problem? • If a prediction is incorrect, speculatively executed instructions must be re-executed • This can result in: • Cycle penalties for detecting the misprediction • Cycle penalties for restarting dependent instructions • Incorrect resolution of dependent branch instructions It is better to not predict at all than to mispredict
Confidence Estimator • Decides whether to make a prediction for an instruction • Bases decisions on the accuracy of past predictions • Common confidence estimation method: Saturating Up-Down Counter
Up-Down Counter Start Threshold Correct Correct Correct Correct Don’t Predict Don’t Predict Don’t Predict Predict Incorrect Incorrect Incorrect Incorrect
Local vs. Global • Up-Down counter is local • Only past instances of an instruction affect its counter • Global confidence estimation uses the prediction accuracy (“predictability”) of past dynamic instructions • Problem with global: • Not every past instruction affects the predictability of the current instruction
Example I 1. A = B + C I 2. F = G – H I 3. E = A + A • Instruction 3 depends on 1 but not on 2 • Instruction 3’s predictability is related to 1 but not 2 • If instruction 1 is predicted incorrectly, instruction 3 will also be predicted incorrectly
Is global confidence worthwhile? • Fewer mispredictions than local • If an instruction mispredicts, its dependent instructions know not to predict • Less warm-up time than local • Instructions need not be executed several times before accurate confidence decisions can be made
How common are predictability relationships? Simulation study: • How many instructions in a program predict correctly only when a previous instruction predicts correctly? • Which past instructions have the most influence?
Predictability Relationships Over 70% of instructions for Stride and Last-Value and over 90% for Context have the same prediction accuracy as a past instruction 90% of the time!
Predictability Relationships The most recent 10 instructions have the most influence
Global Confidence Estimation A global confidence estimator must: • Identify for each instruction which past instructions have similar predictability • Use their prediction accuracy to decide whether to predict or not predict
Neural Network • Used to iteratively learn unknown functions from examples • Consists of nodes and links • Each link has a numeric weight • Data is fed to input nodes and propagated to output nodes by the links • Desired output used to adjust (“train”) the weights
Perceptron • Perceptrons only have input and output nodes • They are much easier to implement and train than larger neural networks • Can only learn linearly separable functions
Perceptron Computation • Each bit of input data sourced to an input node • Dot product calculated between input data and weights • Output is “1” if dot product exceeds a threshold; otherwise “0”
Perceptron Training • Weights adjusted so that the perceptron output = the desired output for the given input • Error value (ε) = desired value – perceptron output • ε times each input bit added to each weight
Weights • Weights determine the effect of each input on the output • Positive weight: Output varies directly with input bit • Negative weight: Output varies inversely with input bit • Large weight: Input has strong effect on output • Zero weight Input bit has no effect on output
Linear Separability • An input may have a direct influence on the output • An input may instead have an inverse influence on the output • But an input cannot have a direct influence sometimes and an inverse influence at other times
Perceptron Confidence Estimator • Each input node is a past instruction’s prediction outcome: (1 = correct, –1 = incorrect) • The output is the decision to predict: (1 = predict, 0 = don’t predict) • Weights determine past instruction’s predictability influence on the current instruction: • Positive weight: current instruction mispredicts when past instruction mispredicts • Negative weight: current instruction mispredicts when past instruction predicts correctly • Zero weight: past instruction does not affect current
Perceptron Confidence Estimator Example weights: bias weight = –1 I 1: A = B C weight = 1 I 2: D = E + F weight = 1 I 3: P = Q R weight = 0 I 4: G = A + D(current instruction) Instruction 4 predicts correctly only when 1 and 2 predict correctly
Weight Value Distribution Simulation Study: • What are typical perceptron weight values? • How does the type of predictor influence the weight distribution? • What minimum range do the weights need to have?
Simulation Methodology • Measurements simulated using SimpleScalar 2.0a • SPEC2000 benchmarks: bzip2, gcc, gzip, perlbmk, twolf, vortex • Each benchmark is run for 500 million instructions • Value predictors: Stride, Last-Value, Context • Baseline confidence estimator: 2-bit up-down counter
Simulation Metrics PCORRECT: # of correct predictions PINCORRECT: # of incorrect predictions N: # of cases where no prediction was made
Stride Results Perceptron estimator shows a coverage increase of 8.2% and an accuracy increase of 2.7% over the up-down counter
Last-Value Results Perceptron estimator shows a coverage increase of 10.2% and an accuracy increase of 5.9% over the up-down counter
Context Results Perceptron estimator shows a coverage increase of 6.1% and an accuracy decrease of 2.9% over the up-down counter
Coverage Sensitivity to the Unavailability of Past Instructions
Accuracy Sensitivity to the Unavailability of Past Instructions
Conclusions • Mispredictions are a problem in data value prediction • Benchmark programs exhibit strong predictability relationships between instructions • Perceptrons enable confidence estimators to exploit these predictability relationships • Perceptron-based confidence estimation tends to show significant improvement over up-down counter confidence estimation