1 / 27

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ODEs

This research paper explores the use of stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations (ODEs). The study focuses on the SpiNNaker project, which aims to model the human brain using a million mobile phone processors. The paper also discusses the motivation for reduced precision, the implementation of ODE solvers, and the algorithmic error compared to different arithmetic types.

lanham
Download Presentation

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ODEs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ODEs Steve Furber, Mantas Mikaitis, Michael Hopkins, David Lester Advanced Processor Technologies research group The University of Manchester

  2. The SpiNNaker project • A million mobile phone processors in one computer • Able to model about 1% of the human brain… • …or 10 mice!

  3. SpiNNaker packet router • Hardware router on each node • Packets have a routing key • Router has a look-up table of {key, mask, data} triplets • If address matches a key-mask pair, the associated data tells router what to do with the packet

  4. SpiNNaker machines SpiNNaker board (864 ARM cores) • HBP platform • 1M cores • 11 cabinets (including server) • Launch 30 March 2016 • then 500k cores SpiNNaker chip (18 ARM cores) SpiNNaker racks (1M ARM cores)

  5. SpiNNaker-2 • 160 ARM-basedprocessingelements (PEs) • 4 GByte LPDDR4 DRAM • 7 energyefficient chip-to-chip links Dynamic Power Management for enhanced energy efficiency Memory sharing for flexible code, state and weight storage Multiply-Accumulate accelerator for machine learning Productionreadylayout in 22nm FDSOI technology Neuromorphic accelerators and random generators for synapse and neuron computation Network-on-Chip for efficient spike communication Adaptive Body Biasing for energy efficient low voltage operation

  6. Motivation for reduced precision • Energy-efficiency • Memory footprint • Memory bandwidth • SpiNNaker: • ARM968 has no floating-point hardware • Cortex M4F has 32-bit FP Energy per operation: • 64-bit float MAF 20 pJ • 32-bit float MAF 6 pJ • 32-bit int MADD 4 pJ • 16-bit int MADD 1 pJ data: Simon Knowles (GraphCore), 28nm 0v8

  7. Machine learning • dense convolution kernels • abstract neurons • feed-forward connections • trained through backpropagation Gupta, Agrawal, Gopalakrishnan and Narayanan, “Deep learning with limited numerical precision”, Proc. 32nd Intl Conf on Machine Learning, 2015

  8. Fixed-point number representation Standard (non-floating-point) CPU arithmetic is integer • add, subtract, multiply (with accumulate?) • divide (sometimes – not on ARM968!) Fixed-point arithmetic maps onto these integer operations • interpret integer as scaled by 2-n • arithmetic operations work as expected (mainly!) • e.g. GCC accum s16.15 type:

  9. Fixed-point number representation . Fixed-point type in GCC: accum <s, 16, 15>: machine Range: WARNING: • Standard integer “rounding” operation is to truncate, = round down! • GCC “round to nearest” fixed-point option doesn’t work (on ARM code) … …

  10. Fixed-point number representation: multiplier 32-bit accum 64-bit answer Use these bits for rounding rather than throwing away! dec. point (Round and saturate to accum)

  11. Rounding down (RD) Given the output from multiplier: ignore accumrange answer somewhere in this gap

  12. Rounding to nearest (RTN) Given the output from multiplier: 1 0 accumrange answer somewhere in this gap

  13. Stochastic rounding (SR) Use these bits as probability of rounding up, [0,1). Given the output from multiplier: residue If < residue round up, else round down. accumrange answer somewhere in this gap

  14. Ordinary Differential Equations (ODEs) • What are they? Define time evolution of the state variables of a system via differential equations • Where are they used? Neuroscience, physics, chemistry, biology, ecology, pharmacology, meteorology, astronomy, epidemiology, population modelling, … Izhikevich ODE used widely in computational neuroscience On spiking (V > 30 mV):

  15. Example Neural ODE behaviour RS neuron FS neuron (a = 0.02, b = 0.2, c = -65.0, d = 8.0) (a = 0.1, b = 0.2, c = -65.0, d = 2.0) 4.775nA DC current input after 60ms Mathematica algorithmic reference

  16. Arithmetic types used for comparison • 64-bit IEEE double-precision floating-point (reference) • 32-bit IEEE single-precision floating-point • 32-bit accum s16.15 fixed-point (with 3 rounding modes) Also discussed in the paper… • 32-bit s0.31 fixed-point (if mixed-precision required) • 16-bit s8.7 fixed-point (with 3 rounding modes)

  17. Solver algorithms and ESR We implement 4 explicit ODE solvers… • and combine with Izhikevich equation • using ESR (Explicit Solver Reduction) technique • (for details see: Hopkins & Furber (2015), Neural Computation) • Runge-Kutta 2nd order Midpoint • Runge-Kutta 2nd order Trapezoid • Runge-Kutta 3rd order Heun • Chan-Tsai hybrid

  18. Arithmetic detail • Rounding constants within ODE definitions • e.g. exact = 0.04 = 1310.72 / 215 • s16.15 (RD) = 0.0399780… = 1310 / 215 • s16.15 (RTN) = 0.0400085… = 1311 / 215 • s0.31 (RTN) = 0.0400000000372529 • float = 0.03999999910593… • GCC surprise – RTN not implemented for fixed-point • using RTN instead of RD above improves accuracy a lot! • Mixed precision multiplies - accum x fract • Automated test bench using C macros to switch options

  19. Algorithmic Error vs Arithmetic Error Algorithmic reference Mathematica solver 30 decimal places NBonly looking at arithmetic error here i.e. choose an imperfect algorithm and compare other arithmetics to IEEE double reference Algorithmic Error 1 Algorithmic Error 4 Arithmetic ref 4 Chan-Tsai IEEE double Arithmetic ref 1 RK2 Midpoint IEEE double …other ODE solver algorithms… s16.15 RTN s16.15 RTN s16.15 SR s16.15 SR s16.15 RD s16.15 RD IEEE single IEEE single RK2 Midpoint arithmetic error results Chan-Tsai arithmetic error results

  20. Quantifying error Spike time lead/lag relative to arithmetic reference Algorithmic error Arithmetic error

  21. Lead/lag of first 650 spikes (vs double ref) Spike lag/lead compared to double ref (ms). Spike number

  22. Detail of 650th spike time results (ms)

  23. Comparison of SR resolutions Given the output from multiplier: 6-bit SR 12-bit SR Use SOME of these bits as probability of rounding up, [0,1).

  24. Neuron learning rules • A common feature of neural models: • synaptic weights are held at lowish precision, e.g. 16 bits w: • learning rules generate weight increments << 1 LSB △w: • solution: add with stochastic rounding • if RAND[0:1] < △w, w = w + 1 • works because learning is pretty stochastic anyway!

  25. Dither • Idea adapted from audio and image dither • audio quantisation noise correlates with signal • a small amount of added noise • decorrelates noise, improving subjective sound quality • and increases effective precision • Small amount of Gaussian noise added at ODE input • significantly simpler/cheaper than full SR • on input only, not every multiply • Even a tiny amount - i.e. SD of 1 machine • improves float (always) • and RTN (usually) • Not as consistent or robust as full SR • Possible to identify optimal amount? (convergence?) Vanderkooy and Lipshitz, “Dither in digital audio”, J. Audio Eng. Soc. 1987

  26. Dither epsilon sweeps (float constants) RS neuron FS neuron • NB: 1st step is highly significant for float!

  27. Conclusions • There is growing interest in reduced precision number representations in many areas of computing • To reduce memory footprint, energy requirements, etc. • Lower precision incurs accuracy issues • Stochastic rounding gives optimal accuracy • Stochastic rounding can also address scaling mismatch in weight increment • Dither techniques are simpler and may perform adequately?

More Related