160 likes | 261 Views
A Programmable Adaptive Router for a GALS Parallel System. Jian Wu APT Group University of Manchester May 2009. SpiNNaker System for Neural Simulation. Node = SpiNNaker CMP + large off-chip memory. Massively-Parallel (1 million ARMs) Massive neural net simulations
E N D
A Programmable Adaptive Router for a GALS Parallel System Jian Wu APT Group University of Manchester May 2009
SpiNNaker System for Neural Simulation Node = SpiNNaker CMP + large off-chip memory • Massively-Parallel • (1 million ARMs) • Massive neural net simulations • (1 billion neurons in real time) • GALS infrastructure • Fault-tolerant
Router Requirements • Operation requirements: • Route multicast, point-to-point and nearest-neighbour packets. • Reprogrammable at run-time. • Provide an external interface to system resources. • Fault-tolerant operation. • Power efficiency. • Bandwidth Requirements: ~7.4Gb/s • On-Chip traffic: (20-1)procs x 1000neurons x 72bit x 1000Hz = 1.368Gb/s • Inter-chip traffic: 1Gb/s x 6 links = 6Gb/s • Bandwidth Target = 72bit x 200MHz = 14.4Gb/s
Router architecture • Packet checking: • - Check packet for errors and • enable appropriate routing engine • Multicast (MC) router: • - Route neural spikes according to their • source address • Point-to-Point (P2P) router: • - Route system management and control • information packets. • Nearest-neighbor (NN) router: • - Route system boot-up and debugging info • - Provide external I/F to resources • Adaptive routing: • - Redirect blocked packets • Router Interface to system NoC: • - AHB Master and Slave Interfaces
Default and Adaptive Routing • Route packets “across chip” by default (save RT entries!) • Automatically re-route packets destined to congested or failed links
Interfacing with System NoC • Nearest-Neighbour packets are diverted to the System NoC. • Programming data is sourced from the System NoC.
Elastic Buffering • The spiking rate for the great majority of neurons is low -just a few Hz: Pipeline “bubbles” between valid packets. • There can be more than one request to the datapath issued in the same clock cycle. • The adaptive routing mechanism stalls the pipeline to find an alternative path for the congested packet. Simple, synthezisable design: • Use ordinary flip-flops for data latching. • Use a global, combinatorial circuit to generate stall signals
Elastic Buffering Pipeline1 Pipeline2 Pipeline3 Disable Disable Disable Flag1 Flag2 Flag3 Back Pressure Pipeline Control Pipeline Control Pipeline Control
Input Interchangeable Buffer • Used for flow control at the head of the pipeline. • One register is used in normal operation • The second is used when a stall occurs in the next stage • The delay is re-introduced when the stall is removed
Parallel-Path Synchronizer Avoid 2-cycle penalty to increase throuhgput
Power Distribution Power distribution under full traffic load Power distribution under 10% traffic load