1 / 62

Applying Perceptrons to Speculation in Computer Architecture

Applying Perceptrons to Speculation in Computer Architecture. Michael Black Dissertation Defense April 2, 2007. Presentation Outline. Background and Objectives Perceptron behavior Local value prediction Global value prediction Criticality prediction Conclusions.

Download Presentation

Applying Perceptrons to Speculation in Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007

  2. Presentation Outline • Background and Objectives • Perceptron behavior • Local value prediction • Global value prediction • Criticality prediction • Conclusions

  3. Motivation: Jimenez’s Perceptron Branch Predictor • 27% reduction in misprediction over gshare • 15.8% increase in performance over gshare1 why better? can consider longer history 1Jimenez and Lin, “Dynamic Branch Prediction with Perceptrons.”, 2002.

  4. Problem of Lookup Tables • Size grows exponentially with history • Result: must consider small subset of available data

  5. Global vs. Local • Local history: past iterations of same instruction • Global history: all past dynamic instructions

  6. Perceptron Predictions: • Dot product of binary inputs and integer weights • Apply threshold: if +, predict 1; if -, predict 0 Learning objective: Weight values should reflect input’s correlation

  7. Training strategies Training by correlation if actual==inputk : wk++ else: wk-- Training by error error = actual - predicted wk = wk + inputk error

  8. Linear Separability Weight can only learn one correlation: • direct (positive) • inverse (negative)

  9. Dissertation Objectives • Analyze behavior of perceptrons when used to replace tables • Coping with limitations of perceptrons and their implementations • Applying perceptrons to value prediction • Applying perceptrons to criticality prediction

  10. Dissertation Contributions • Perceptron Local Value Predictor • can consider longer local histories • Perceptron Global-based Local Value Predictor • can use global information to choose local values • Two Perceptron Global Value Predictors • Perceptron Global Criticality Predictor • Comparison and analysis of: • perceptron training approaches • multiple-bit topologies • interference reduction strategies

  11. Analyses • How perceptrons behave when replacing tables • What effect the training approach has • Design and behavior of different multiple-bit perceptrons • Dealing with history interference

  12. Context-based Learning Concatenated history pattern (“context”) indexes table

  13. Pattern Compatibility

  14. What affects perceptron learning? • Noise from uncorrelated inputs • Imbalance between pattern occurrences • False correlations Effects: • Perceptron takes longer to learn • Perceptron never learns

  15. Noise Training by correlation: • weights grow large rapidly: less susceptible Training by error: • weights don’t grow until misprediction: susceptible Solution? Exponential Weight Growth

  16. Studying Noise pattern set generation for n=4, p=2: ddid 1101xxxx – 1 0010xxxx – 0 11010101 – 1 00101110 – 0 • Perceptron modeled independently of application • p random patterns chosen for each level of correlation: • At n bits correlated, a random correlation direction (direct/inverse) chosen for each of n bits • Target randomly chosen for each pattern; Correlation direction determines first n bits of each pattern • Remaining bits chosen randomly for each pattern • Perceptron is trained on each pattern set • Average of training time for 1000 random pattern sets plotted

  17. How does noise affect training time?

  18. How does imbalance affect training time?

  19. How does imbalance affect learning?

  20. Why can’t training-by-correlation handle imbalance?

  21. Findings • Increasing history size is bad if the percentage of correlated inputs decrease • Must use training-by-error if there is poor correlation and imbalance

  22. Multibit Perceptron Predicts values, not single bits What is a value correlation? • Input value infers a particular output value • 5 --> 4 Approaches: • Disjoint • Fully Coupled • Weight per value

  23. Disjoint Perceptron Tradeoff: + small size - can only learn from respective bits

  24. Fully Coupled Perceptron Tradeoff: + can learn from any past bit - more weights

  25. Learning abilities compared

  26. Weight-per-Value Perceptron Tradeoff: + Can always learn - Tons of weights

  27. History Interference

  28. How common is interference?

  29. How does interference affect perceptrons? • constructive • destructive • neutral • weight-destructive • value-destructive

  30. Interference in Perceptron Branch Prediction

  31. Coping: Assigned Seats Tradeoff: + no additional size - can’t consider multiple iterations of an instruction

  32. Weight for each interfering branch (“Piecewise Linear”) Tradeoff: + interference is completely removed - massive size

  33. Simulator new superscalar cycle-accurate execution-driven simulator can accurately model value prediction & criticality

  34. Value Prediction What is it? • predicting instructions’ data values to overcome data dependencies Why consider it? • requires a multiple-bit prediction, not a single-bit

  35. Table-based Predictor Limitations: • exponential growth in past values & value history • can only consider local history Storage: 70kB for 4 values, 34MB for 8 values, 74*1018 B for 16 values

  36. Perceptron in Pattern Table (PPT) Tradeoff: + Few perceptrons needed (for 4 past values) Can consider longer histories - Exponential growth with # of past values

  37. Perceptron in Value Table (PVT) Tradeoff: + Linear growth in both value history and # past values - More perceptrons needed

  38. Results: PVT 2.4-5.6% accuracy increase, 0.5-1.2% performance increase 102kB-1.3MB storage needed

  39. Results: PPT 1.4-2.8% accuracy decrease: not a good approach 72kB-115kB storage needed

  40. Global-Local Value Prediction Uses global correlation to predict locally available values

  41. Global-Local Predictor

  42. Global-Global Prediction Tradeoff: + Less value storage - More bits needed per perceptron input

  43. Global Bitwise Tradeoff: + No value storage Not limited to past values only - Many more bits needed per perceptron input

  44. Global Predictors Compared Global-Local: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Global-Global: 7.6% accuracy increase, 6.7% performance increase 1.3MB storage needed Bitwise: 12.7% accuracy increase, 5.3% performance increase 4.2MB storage needed

  45. Can Bitwise Predict New Values? 5.0% of all predictions are correct values never seen before Further 9.8% are correct values not seen in local history

  46. Multibit Topologies Compared Disjoint: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Fully Coupled: 6.8% accuracy decrease, 1.5% performance decrease 3.8MB storage needed Weight per Value: 10.7% accuracy increase, 4.4% performance increase 21.5MB storage needed

  47. Training Approaches Compared: Global-Local

  48. Training Approaches Compared: PVT Local

  49. Final Weight Values: Distribution and Accuracy

  50. Anti-Interference Compared

More Related