1 / 18

Fault-Tolerant Delay-Insensitive Inter-Chip Communication

Fault-Tolerant Delay-Insensitive Inter-Chip Communication. Yebin Shi Apt Group The University of Manchester. Outline . SpiNNaker Inter-Chip interconnect Basic Transmitter and Receiver Potential Problems with the Designs Robust Transmitter and Receiver Future work and conclusion.

hayes
Download Presentation

Fault-Tolerant Delay-Insensitive Inter-Chip Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault-Tolerant Delay-Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manchester

  2. Outline • SpiNNaker Inter-Chip interconnect • Basic Transmitter and Receiver • Potential Problems with the Designs • Robust Transmitter and Receiver • Future work and conclusion

  3. Research Aims • Investigate the impact of transient glitches at inter-chip wires on the interface circuits. • Redesign the link interface circuits to increase glitch-resistance and avoid deadlock.

  4. SpiNNaker • Network infrastructure: • 6 bidirectional inter-chip links • delay-insensitive on-chip and • inter-chip communication • Packets are variable-length, • serialized in 4-bit flits, • with end-of-packet marker • 1 Gb/s throughput per link

  5. Inter-Chip Communication • On-Chip Network: • 3of6 data encoding • 4-phase (RTZ) handshake • separate data and control • channels • Inter-Chip Network: • 2of7 data encoding • 2-phase (NRZ) handshake • data and control in single • stream

  6. Link Transmitter • data channel: pipeline for code and phase conversion • ctrl channel: merge EoP symbol into the data stream

  7. Link Receiver • data channel: phase and code conversion pipeline • ctrl channel: Extract EoP symbols from stream

  8. Glitch Impact on Simulation • Automatic packet data generation • CRC scheme included for result verification • Random generation of transient glitches • injected onto the inter-chip link • Single Event Upset (SEU) scenarios • Configurable frequency and duration of glitches • Frequency: up to ½ glitch/packet • duration scale: 0.1-2 ns • Extensive simulation • a large number of densely packed glitches over 1M packets • speed-up fault simulation

  9. Fault effects in the Transmitter • Deadlock risks: • A transient glitch may corrupt a 2-of-7 symbol, • leading to handshaking failure. • Phase-sensitive phase converter. • Independent reseting.

  10. Fault Effects in the Receiver • Deadlock risks: • A corrupted 2-of-7 symbol may prevent completion • of conversion to 3of6. • Independent reseting.

  11. Deadlock in Receiver • a glitch occurs when dout_cd is in transit • a wrong value stored in the bottom latch • a conversion failure for next data conversion

  12. Robust 2-ph to 4-ph Conversion reset signal not shown • phase-insensitive converter: • Used in 2-phase ack input to the Transmitter. • Used in 2-phase data inputs to the Receiver.

  13. Robust Receiver Design • Phase-insensitive phase converter • Enhanced code converter and completion detector • Independent reset capability

  14. Receiver Phase Converter acki also triggers the ack signal back to the transmitter

  15. Code conversion with Priority Arbitration • support full set of 2-of-7 code • convert invalid symbols • into a valid one • stop propagation of invalid • symbols containing more than • 2 transitions

  16. Independent Reset • An extra, possibly redundant, transition is created after reset in case the Tx is waiting for an acknowledge token. • The phase-insensitive converter for ack2 in TX absorbs the extra token if it is not needed.

  17. Simulation results • Significantly reduced • deadlock occurrence. • worse packet loss. • trivial area overhead. • increased throughput. Simulation results for 1 million packets sent

  18. Conclusions and Future work • Enhance the resistance to transient glitches in inter-chip links by replacing phase converters. • Avoid deadlocks by hardening completion detection modules in the receiver. • Remove corrupt symbols by applying an arbitration scheme for symbol conversions. • Allow independent chip resets without introducing deadlocks by sending safe, possibly redundant tokens (data or ack) on reset. • A generalized approach for circuit evaluation, including the computation of safety margins. • Investigation into the impact of back-pressure on glitch resistance.

More Related