270 likes | 286 Views
This research paper explores the use of stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations (ODEs). The study focuses on the SpiNNaker project, which aims to model the human brain using a million mobile phone processors. The paper also discusses the motivation for reduced precision, the implementation of ODE solvers, and the algorithmic error compared to different arithmetic types.
E N D
Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ODEs Steve Furber, Mantas Mikaitis, Michael Hopkins, David Lester Advanced Processor Technologies research group The University of Manchester
The SpiNNaker project • A million mobile phone processors in one computer • Able to model about 1% of the human brain… • …or 10 mice!
SpiNNaker packet router • Hardware router on each node • Packets have a routing key • Router has a look-up table of {key, mask, data} triplets • If address matches a key-mask pair, the associated data tells router what to do with the packet
SpiNNaker machines SpiNNaker board (864 ARM cores) • HBP platform • 1M cores • 11 cabinets (including server) • Launch 30 March 2016 • then 500k cores SpiNNaker chip (18 ARM cores) SpiNNaker racks (1M ARM cores)
SpiNNaker-2 • 160 ARM-basedprocessingelements (PEs) • 4 GByte LPDDR4 DRAM • 7 energyefficient chip-to-chip links Dynamic Power Management for enhanced energy efficiency Memory sharing for flexible code, state and weight storage Multiply-Accumulate accelerator for machine learning Productionreadylayout in 22nm FDSOI technology Neuromorphic accelerators and random generators for synapse and neuron computation Network-on-Chip for efficient spike communication Adaptive Body Biasing for energy efficient low voltage operation
Motivation for reduced precision • Energy-efficiency • Memory footprint • Memory bandwidth • SpiNNaker: • ARM968 has no floating-point hardware • Cortex M4F has 32-bit FP Energy per operation: • 64-bit float MAF 20 pJ • 32-bit float MAF 6 pJ • 32-bit int MADD 4 pJ • 16-bit int MADD 1 pJ data: Simon Knowles (GraphCore), 28nm 0v8
Machine learning • dense convolution kernels • abstract neurons • feed-forward connections • trained through backpropagation Gupta, Agrawal, Gopalakrishnan and Narayanan, “Deep learning with limited numerical precision”, Proc. 32nd Intl Conf on Machine Learning, 2015
Fixed-point number representation Standard (non-floating-point) CPU arithmetic is integer • add, subtract, multiply (with accumulate?) • divide (sometimes – not on ARM968!) Fixed-point arithmetic maps onto these integer operations • interpret integer as scaled by 2-n • arithmetic operations work as expected (mainly!) • e.g. GCC accum s16.15 type:
Fixed-point number representation . Fixed-point type in GCC: accum <s, 16, 15>: machine Range: WARNING: • Standard integer “rounding” operation is to truncate, = round down! • GCC “round to nearest” fixed-point option doesn’t work (on ARM code) … …
Fixed-point number representation: multiplier 32-bit accum 64-bit answer Use these bits for rounding rather than throwing away! dec. point (Round and saturate to accum)
Rounding down (RD) Given the output from multiplier: ignore accumrange answer somewhere in this gap
Rounding to nearest (RTN) Given the output from multiplier: 1 0 accumrange answer somewhere in this gap
Stochastic rounding (SR) Use these bits as probability of rounding up, [0,1). Given the output from multiplier: residue If < residue round up, else round down. accumrange answer somewhere in this gap
Ordinary Differential Equations (ODEs) • What are they? Define time evolution of the state variables of a system via differential equations • Where are they used? Neuroscience, physics, chemistry, biology, ecology, pharmacology, meteorology, astronomy, epidemiology, population modelling, … Izhikevich ODE used widely in computational neuroscience On spiking (V > 30 mV):
Example Neural ODE behaviour RS neuron FS neuron (a = 0.02, b = 0.2, c = -65.0, d = 8.0) (a = 0.1, b = 0.2, c = -65.0, d = 2.0) 4.775nA DC current input after 60ms Mathematica algorithmic reference
Arithmetic types used for comparison • 64-bit IEEE double-precision floating-point (reference) • 32-bit IEEE single-precision floating-point • 32-bit accum s16.15 fixed-point (with 3 rounding modes) Also discussed in the paper… • 32-bit s0.31 fixed-point (if mixed-precision required) • 16-bit s8.7 fixed-point (with 3 rounding modes)
Solver algorithms and ESR We implement 4 explicit ODE solvers… • and combine with Izhikevich equation • using ESR (Explicit Solver Reduction) technique • (for details see: Hopkins & Furber (2015), Neural Computation) • Runge-Kutta 2nd order Midpoint • Runge-Kutta 2nd order Trapezoid • Runge-Kutta 3rd order Heun • Chan-Tsai hybrid
Arithmetic detail • Rounding constants within ODE definitions • e.g. exact = 0.04 = 1310.72 / 215 • s16.15 (RD) = 0.0399780… = 1310 / 215 • s16.15 (RTN) = 0.0400085… = 1311 / 215 • s0.31 (RTN) = 0.0400000000372529 • float = 0.03999999910593… • GCC surprise – RTN not implemented for fixed-point • using RTN instead of RD above improves accuracy a lot! • Mixed precision multiplies - accum x fract • Automated test bench using C macros to switch options
Algorithmic Error vs Arithmetic Error Algorithmic reference Mathematica solver 30 decimal places NBonly looking at arithmetic error here i.e. choose an imperfect algorithm and compare other arithmetics to IEEE double reference Algorithmic Error 1 Algorithmic Error 4 Arithmetic ref 4 Chan-Tsai IEEE double Arithmetic ref 1 RK2 Midpoint IEEE double …other ODE solver algorithms… s16.15 RTN s16.15 RTN s16.15 SR s16.15 SR s16.15 RD s16.15 RD IEEE single IEEE single RK2 Midpoint arithmetic error results Chan-Tsai arithmetic error results
Quantifying error Spike time lead/lag relative to arithmetic reference Algorithmic error Arithmetic error
Lead/lag of first 650 spikes (vs double ref) Spike lag/lead compared to double ref (ms). Spike number
Comparison of SR resolutions Given the output from multiplier: 6-bit SR 12-bit SR Use SOME of these bits as probability of rounding up, [0,1).
Neuron learning rules • A common feature of neural models: • synaptic weights are held at lowish precision, e.g. 16 bits w: • learning rules generate weight increments << 1 LSB △w: • solution: add with stochastic rounding • if RAND[0:1] < △w, w = w + 1 • works because learning is pretty stochastic anyway!
Dither • Idea adapted from audio and image dither • audio quantisation noise correlates with signal • a small amount of added noise • decorrelates noise, improving subjective sound quality • and increases effective precision • Small amount of Gaussian noise added at ODE input • significantly simpler/cheaper than full SR • on input only, not every multiply • Even a tiny amount - i.e. SD of 1 machine • improves float (always) • and RTN (usually) • Not as consistent or robust as full SR • Possible to identify optimal amount? (convergence?) Vanderkooy and Lipshitz, “Dither in digital audio”, J. Audio Eng. Soc. 1987
Dither epsilon sweeps (float constants) RS neuron FS neuron • NB: 1st step is highly significant for float!
Conclusions • There is growing interest in reduced precision number representations in many areas of computing • To reduce memory footprint, energy requirements, etc. • Lower precision incurs accuracy issues • Stochastic rounding gives optimal accuracy • Stochastic rounding can also address scaling mismatch in weight increment • Dither techniques are simpler and may perform adequately?