1 / 44

Computing beyond a Million Processors - bio-inspired massively-parallel architectures

Computing beyond a Million Processors - bio-inspired massively-parallel architectures. Andrew Brown The University of Southampton adb@ecs.soton.ac.uk. Steve Furber The University of Manchester steve.furber@manchester.ac.uk. SBF is supported by a Royal Society- Wolfson Research Merit Award.

Download Presentation

Computing beyond a Million Processors - bio-inspired massively-parallel architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing beyond a Million Processors- bio-inspired massively-parallel architectures Andrew BrownThe University of Southampton adb@ecs.soton.ac.uk Steve FurberThe University of Manchester steve.furber@manchester.ac.uk SBF is supported by a Royal Society-Wolfson Research Merit Award

  2. Outline • Computer Architecture Perspective • Building Brains • Living with Failure • Design Principles • The SpiNNaker system • Concurrency • Conclusions

  3. Multi-core CPUs • High-end uniprocessors • diminishing returns from complexity • wire vs transistor delays • Multi-core processors • cut-and-paste • simple way to deliver more MIPS • Moore’s Law • more transistors • more cores … but what about the software?

  4. Multi-core CPUS • General-purpose parallelization • an unsolved problem • the ‘Holy Grail’ of computer science for half a century? • but imperative in the many-core world • Once solved • few complex cores, or many simple cores? • simple cores win hands-down on power-efficiency!

  5. Back to the future • Imagine… • a limitless supply of (free) processors • load-balancing is irrelevant • all that matters is: • the energy used to perform a computation • formulating the problem to avoid synchronisation • abandoning determinism • How might such systems work?

  6. Bio-inspiration • How can massively parallel computing resources accelerate our understanding of brain function? • How can our growing understanding of brain function point the way to more efficient parallel, fault-tolerant computation?

  7. Outline • Computer Architecture Perspective • Building Brains • Living with Failure • Design Principles • The SpiNNaker system • Concurrency • Conclusions

  8. Building brains • Brains demonstrate • massive parallelism (1012 neurons) • massive connectivity (1015 synapses) • excellent power-efficiency • much better than today’s microchips • low-performance components (~ 100 Hz) • low-speed communication (~ metres/sec) • adaptivity – tolerant of component failure • autonomous learning

  9. Neurons • Multiple inputs (dendrites) • Single output (axon) • digital “spike” • fires at 10s to 100s of Hz • output connects to many targets • Synapse at input/output connection (www.ship.edu/ ~cgboeree/theneuron.html)

  10. Neurons • A flexible biological control component • very simple animals have a handful • bees: 850,000 • humans: 1012 (photo courtesy of the Brain Mind Institute, EPFL)

  11. Neurons • Regular high-level structure • e.g. 6-level cortical microachitecture • low-level vision, to • language, etc. • Random low-level structure • adapts over time (faculty.washington.edu/ rhevner/Miscellany.html)

  12. w1 w2 w3 w4 x1 x2 x3 x4 f y Neural Computation • To compute we need: • Processing • Communication • Storage • Processing:abstract model • linear sum ofweighted inputs • ignores non-linearprocesses in dendrites • non-linear output function • learn by adjusting synaptic weights

  13. Processing • Leaky integrate-and-fire model • inputs are a series of spikes • total input is a weighted sum of the spikes • neuron activation is the input with a “leaky” decay • when activation exceeds threshold, output fires • habituation, refractory period, …?

  14. v u Processing ( www.izhikevich.com ) • Izhikevich model • two variables, one fast, one slow: • neuron fires whenv > 30; then: • a, b, c & d select behaviour

  15. Communication • Spikes • biological neurons communicate principally via ‘spike’ events • asynchronous • information is only: • which neuron fires, and • when it fires

  16. Storage • Synaptic weights • stable over long periods of time • with diverse decay properties? • adaptive, with diverse rules • Hebbian, anti-Hebbian, LTP, LTD, ... • Axon ‘delay lines’ • Neuron dynamics • multiple time constants • Dynamic network states

  17. Outline • Building Brains • Computer Architecture Perspective • Living with Failure • Design Principles • The SpiNNaker system • Concurrency • Conclusions

  18. 100 Pentium 4 Pentium III 10 Pentium Pentium II 486 1 386 286 Millions of transistors per chip 0.1 8086 0.01 4004 8080 8008 0.001 1970 1975 1980 1985 1990 1995 2000 Year The Good News... Transistors per Intel chip

  19. ...and the Bad News • Device variability • & • Component failure

  20. Atomic Scale devices The simulation Paradigm now A 22 nm MOSFET In production 2008 A 4.2 nm MOSFET In production 2023

  21. A view from Intel • The Good News: • we will have 100 billion transistor ICs • The Bad News: • billions will fail in manufacture • unusable due to parameter variations • billions more will fail over the first year of operation • intermittent and permanent faults (ShekharBorkar, Intel Fellow)

  22. A view from Intel • Conclusions: • one-time production test will be out • burn-in to catch infant mortality will be impractical • test hardware will be an integral part of the design • dynamically self-test, detect errors, reconfigure, adapt, ... (ShekharBorkar, Intel Fellow)

  23. Outline • Building Brains • Computer Architecture Perspective • Living with Failure • Design Principles • The SpiNNaker system • Concurrency • Conclusions

  24. Design principles • Virtualised topology • physical and logical connectivity are decoupled • Bounded asynchrony • time models itself • Energy frugality • processors are free • the real cost of computation is energy

  25. Outline • Building Brains • Computer Architecture Perspective • Living with Failure • Design Principles • The SpiNNaker system • Concurrency • Conclusions

  26. SpiNNaker project • Multi-core CPU node • 20 ARM968 processors • to model large-scale systems of spiking neurons • Scalable up to systems with 10,000s of nodes • over a million processors • >108 MIPS total • Power ~ 25mw/neuron

  27. SpiNNaker project

  28. Fault-tolerant architecture for large-scale neural modelling Abillion neurons in real time Astep-function increase in the scale of neural computation Cost- and energy-efficient SpiNNaker project

  29. SpiNNaker system

  30. CMP node

  31. ARM968 subsystem

  32. GALS organization • clocked IP blocks • self-timed interconnect • self-timed inter-chip links

  33. Outline • Building Brains • Computer Architecture Perspective • Living with Failure • Design Principles • The SpiNNaker system • Concurrency • Conclusions

  34. din (2 phase) dout (4 phase) ¬reset ¬ack Circuit-level concurrency Rx Tx data ack • Delay-insensitive comms • 3-of-6 RTZ on chip • 2-of-7 NRZ off chip • Deadlock resistance • Tx & Rx circuits have high deadlock immunity • Tx & Rx can be reset independently • each injects a token at reset • true transition detector filters surplus token

  35. System-level concurrency • Breaking symmetry • any processor can be Monitor Processor • local ‘election’ on each chip, after self-test • all nodes are identical at start-up • addresses are computed relative to node with host connection (0,0) • system initialised using flood-fill • nearest-neighbour packet type • boot time (almost) independent of system scale

  36. sleeping event DMA Completion Interrupt Packet Received Interrupt Timer Millisecond Interrupt fetch_ Synaptic_ Data(); update_ Neurons(); update_ Stimulus(); Priority 3 Priority 1 Priority 2 goto_Sleep(); Application-level concurrency • Event-driven real-time software • spike packet arrived • initiate DMA • DMA of synaptic data completed • process inputs • insert axonal delay • 1ms Timer interrupt

  37. Application-level concurrency • Cross-system delay << 1ms • hardware routing • ‘emergency’ routing • failed links • Congestion • if all else fails • drop packet

  38. Biological concurrency • Firing rate population codes • N neurons • diverse tuning • collective coding ofa physical parameter • accuracy • robust to neuronfailure (Neural Engineering,Eliasmith & Anderson 2003)

  39. Biological concurrency • Single spike/neuron codes • choose N to fire from a population of M • order of firing may or may not matter

  40. Outline • Building Brains • Computer Architecture Perspective • Living with Failure • Design Principles • The SpiNNaker system • Concurrency • Conclusions

  41. Software progress • ARM SoC Designer SystemC model • 4 chip x 2 CPU top-level Verilog model • Running: • boot code • Izhikevich model • PDP2 codes • Basic configuration flow • …it all works!

  42. Where might this lead? • Robots • iCub EU project • open humanoid robot platform • mechanics, but no brain!

  43. Conclusions • Many-core processing is coming • soon we will have far more processors than we can program • When (if?) we crack parallelism… • more small processors are better than fewer large processors • synchronization, coherent global memory, determinism, are all impediments • Biology suggests a way forward! • but we need new theories of biological concurrency

  44. UoM SpiNNakerteam

More Related