1 / 16

Branch Prediction with Neural-Networks: Hidden Layers and Recurrent Connections

Branch Prediction with Neural-Networks: Hidden Layers and Recurrent Connections. Andrew Smith CSE Dept. June 10, 2004. Outline. What is a Perceptron? Learning? What is a Feed-Forward Network? Learning? What is a Recurrent Network Learning? How to do it on hardware???

hollie
Download Presentation

Branch Prediction with Neural-Networks: Hidden Layers and Recurrent Connections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Branch Prediction with Neural-Networks: Hidden Layers andRecurrent Connections Andrew Smith CSE Dept. June 10, 2004

  2. Outline • What is a Perceptron? • Learning? • What is a Feed-Forward Network? • Learning? • What is a Recurrent Network • Learning? • How to do it on hardware??? • Results – Adding hidden units • Results – Modeling latency of slow networks. • Results – Varying the hardware budget

  3. The Perceptron Linear (affine) combination of inputs  DECISION

  4. Perceptron Learning inputs xj, outputs yi and targets ti are {-1, +1} Cycle through training set if Xi = (x1, x2, …, xd) is misclassified, do wj wj + a * ti * xjend if

  5. Feed-Forward Network A network of perceptrons…

  6. Feed-forward Network Learning Use A gradient-descent algorithm. Network output is: Error is: Derivatives of error are:

  7. Feed-Forward Networks, BACKPROP • But no error defined for hidden units??? • Solution, assign responsibility for output units error to each hidden unit, then descend gradient • This is called “back-propagation”

  8. Recurrent Networks Now it has state…

  9. Learning weights for a RNN Unroll it and use back-propagation? No! Too Slow, and wrong…

  10. Use Real-Time Recurrent Learning • Keep list, at each time T: • For each Unit u • For each Weight w • Keep partial derivative du/dw • Update with recurrence relation:

  11. But on hardware??? • Idea, represent real numbers in [-4, +4] with integers in [-4096..4096] • Adding, is ok… • 1024 i + 1024 j = (i+j)1024 • Multiplying requires a divide (shift): • (1024 i) * (1024 j) = (i*j)1024^2 • Compute activation function by looking up in a discretized table.

  12. Results, different numbers of hidden units

  13. Results, Different latencies

  14. Results, different HW budget (crafty)

  15. Results, Different HW budges (BZIP-PROGRAM)

  16. Conclusions • DON’T use a RNN! • Maybe use a NNet with a few hidden units, but don’t over do it • Future work: explore trade-off between • Number, size (hidden units), inputs

More Related