180 likes | 321 Views
Fault-Tolerant Delay-Insensitive Inter-Chip Communication. Yebin Shi Apt Group The University of Manchester. Outline . SpiNNaker Inter-Chip interconnect Basic Transmitter and Receiver Potential Problems with the Designs Robust Transmitter and Receiver Future work and conclusion.
E N D
Fault-Tolerant Delay-Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manchester
Outline • SpiNNaker Inter-Chip interconnect • Basic Transmitter and Receiver • Potential Problems with the Designs • Robust Transmitter and Receiver • Future work and conclusion
Research Aims • Investigate the impact of transient glitches at inter-chip wires on the interface circuits. • Redesign the link interface circuits to increase glitch-resistance and avoid deadlock.
SpiNNaker • Network infrastructure: • 6 bidirectional inter-chip links • delay-insensitive on-chip and • inter-chip communication • Packets are variable-length, • serialized in 4-bit flits, • with end-of-packet marker • 1 Gb/s throughput per link
Inter-Chip Communication • On-Chip Network: • 3of6 data encoding • 4-phase (RTZ) handshake • separate data and control • channels • Inter-Chip Network: • 2of7 data encoding • 2-phase (NRZ) handshake • data and control in single • stream
Link Transmitter • data channel: pipeline for code and phase conversion • ctrl channel: merge EoP symbol into the data stream
Link Receiver • data channel: phase and code conversion pipeline • ctrl channel: Extract EoP symbols from stream
Glitch Impact on Simulation • Automatic packet data generation • CRC scheme included for result verification • Random generation of transient glitches • injected onto the inter-chip link • Single Event Upset (SEU) scenarios • Configurable frequency and duration of glitches • Frequency: up to ½ glitch/packet • duration scale: 0.1-2 ns • Extensive simulation • a large number of densely packed glitches over 1M packets • speed-up fault simulation
Fault effects in the Transmitter • Deadlock risks: • A transient glitch may corrupt a 2-of-7 symbol, • leading to handshaking failure. • Phase-sensitive phase converter. • Independent reseting.
Fault Effects in the Receiver • Deadlock risks: • A corrupted 2-of-7 symbol may prevent completion • of conversion to 3of6. • Independent reseting.
Deadlock in Receiver • a glitch occurs when dout_cd is in transit • a wrong value stored in the bottom latch • a conversion failure for next data conversion
Robust 2-ph to 4-ph Conversion reset signal not shown • phase-insensitive converter: • Used in 2-phase ack input to the Transmitter. • Used in 2-phase data inputs to the Receiver.
Robust Receiver Design • Phase-insensitive phase converter • Enhanced code converter and completion detector • Independent reset capability
Receiver Phase Converter acki also triggers the ack signal back to the transmitter
Code conversion with Priority Arbitration • support full set of 2-of-7 code • convert invalid symbols • into a valid one • stop propagation of invalid • symbols containing more than • 2 transitions
Independent Reset • An extra, possibly redundant, transition is created after reset in case the Tx is waiting for an acknowledge token. • The phase-insensitive converter for ack2 in TX absorbs the extra token if it is not needed.
Simulation results • Significantly reduced • deadlock occurrence. • worse packet loss. • trivial area overhead. • increased throughput. Simulation results for 1 million packets sent
Conclusions and Future work • Enhance the resistance to transient glitches in inter-chip links by replacing phase converters. • Avoid deadlocks by hardening completion detection modules in the receiver. • Remove corrupt symbols by applying an arbitration scheme for symbol conversions. • Allow independent chip resets without introducing deadlocks by sending safe, possibly redundant tokens (data or ack) on reset. • A generalized approach for circuit evaluation, including the computation of safety margins. • Investigation into the impact of back-pressure on glitch resistance.