620 likes | 771 Views
Applying Perceptrons to Speculation in Computer Architecture. Michael Black Dissertation Defense April 2, 2007. Presentation Outline. Background and Objectives Perceptron behavior Local value prediction Global value prediction Criticality prediction Conclusions.
E N D
Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007
Presentation Outline • Background and Objectives • Perceptron behavior • Local value prediction • Global value prediction • Criticality prediction • Conclusions
Motivation: Jimenez’s Perceptron Branch Predictor • 27% reduction in misprediction over gshare • 15.8% increase in performance over gshare1 why better? can consider longer history 1Jimenez and Lin, “Dynamic Branch Prediction with Perceptrons.”, 2002.
Problem of Lookup Tables • Size grows exponentially with history • Result: must consider small subset of available data
Global vs. Local • Local history: past iterations of same instruction • Global history: all past dynamic instructions
Perceptron Predictions: • Dot product of binary inputs and integer weights • Apply threshold: if +, predict 1; if -, predict 0 Learning objective: Weight values should reflect input’s correlation
Training strategies Training by correlation if actual==inputk : wk++ else: wk-- Training by error error = actual - predicted wk = wk + inputk error
Linear Separability Weight can only learn one correlation: • direct (positive) • inverse (negative)
Dissertation Objectives • Analyze behavior of perceptrons when used to replace tables • Coping with limitations of perceptrons and their implementations • Applying perceptrons to value prediction • Applying perceptrons to criticality prediction
Dissertation Contributions • Perceptron Local Value Predictor • can consider longer local histories • Perceptron Global-based Local Value Predictor • can use global information to choose local values • Two Perceptron Global Value Predictors • Perceptron Global Criticality Predictor • Comparison and analysis of: • perceptron training approaches • multiple-bit topologies • interference reduction strategies
Analyses • How perceptrons behave when replacing tables • What effect the training approach has • Design and behavior of different multiple-bit perceptrons • Dealing with history interference
Context-based Learning Concatenated history pattern (“context”) indexes table
What affects perceptron learning? • Noise from uncorrelated inputs • Imbalance between pattern occurrences • False correlations Effects: • Perceptron takes longer to learn • Perceptron never learns
Noise Training by correlation: • weights grow large rapidly: less susceptible Training by error: • weights don’t grow until misprediction: susceptible Solution? Exponential Weight Growth
Studying Noise pattern set generation for n=4, p=2: ddid 1101xxxx – 1 0010xxxx – 0 11010101 – 1 00101110 – 0 • Perceptron modeled independently of application • p random patterns chosen for each level of correlation: • At n bits correlated, a random correlation direction (direct/inverse) chosen for each of n bits • Target randomly chosen for each pattern; Correlation direction determines first n bits of each pattern • Remaining bits chosen randomly for each pattern • Perceptron is trained on each pattern set • Average of training time for 1000 random pattern sets plotted
Findings • Increasing history size is bad if the percentage of correlated inputs decrease • Must use training-by-error if there is poor correlation and imbalance
Multibit Perceptron Predicts values, not single bits What is a value correlation? • Input value infers a particular output value • 5 --> 4 Approaches: • Disjoint • Fully Coupled • Weight per value
Disjoint Perceptron Tradeoff: + small size - can only learn from respective bits
Fully Coupled Perceptron Tradeoff: + can learn from any past bit - more weights
Weight-per-Value Perceptron Tradeoff: + Can always learn - Tons of weights
How does interference affect perceptrons? • constructive • destructive • neutral • weight-destructive • value-destructive
Coping: Assigned Seats Tradeoff: + no additional size - can’t consider multiple iterations of an instruction
Weight for each interfering branch (“Piecewise Linear”) Tradeoff: + interference is completely removed - massive size
Simulator new superscalar cycle-accurate execution-driven simulator can accurately model value prediction & criticality
Value Prediction What is it? • predicting instructions’ data values to overcome data dependencies Why consider it? • requires a multiple-bit prediction, not a single-bit
Table-based Predictor Limitations: • exponential growth in past values & value history • can only consider local history Storage: 70kB for 4 values, 34MB for 8 values, 74*1018 B for 16 values
Perceptron in Pattern Table (PPT) Tradeoff: + Few perceptrons needed (for 4 past values) Can consider longer histories - Exponential growth with # of past values
Perceptron in Value Table (PVT) Tradeoff: + Linear growth in both value history and # past values - More perceptrons needed
Results: PVT 2.4-5.6% accuracy increase, 0.5-1.2% performance increase 102kB-1.3MB storage needed
Results: PPT 1.4-2.8% accuracy decrease: not a good approach 72kB-115kB storage needed
Global-Local Value Prediction Uses global correlation to predict locally available values
Global-Global Prediction Tradeoff: + Less value storage - More bits needed per perceptron input
Global Bitwise Tradeoff: + No value storage Not limited to past values only - Many more bits needed per perceptron input
Global Predictors Compared Global-Local: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Global-Global: 7.6% accuracy increase, 6.7% performance increase 1.3MB storage needed Bitwise: 12.7% accuracy increase, 5.3% performance increase 4.2MB storage needed
Can Bitwise Predict New Values? 5.0% of all predictions are correct values never seen before Further 9.8% are correct values not seen in local history
Multibit Topologies Compared Disjoint: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Fully Coupled: 6.8% accuracy decrease, 1.5% performance decrease 3.8MB storage needed Weight per Value: 10.7% accuracy increase, 4.4% performance increase 21.5MB storage needed