1 / 21

Fault Tolerant NoC Communication

Fault Tolerant NoC Communication. Greg Link. Benefits and Drawbacks of NoC . Benefits Exploits parallelism with scalable interconnect Employs heavy design reuse Removes complexity of bus interconnect Drawbacks Tight area constraints on routers Low latency imperative

Faraday
Download Presentation

Fault Tolerant NoC Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault Tolerant NoC Communication Greg Link

  2. Benefits and Drawbacks of NoC • Benefits • Exploits parallelism with scalable interconnect • Employs heavy design reuse • Removes complexity of bus interconnect • Drawbacks • Tight area constraints on routers • Low latency imperative • Uncertain node-node delays

  3. Why Fault Tolerance? • Offers many advantages: • Avoids costly packet retransmissions • Avoids catastrophic data loss • Can increase chip yield • Allows higher speed operation • In NoC specifically • Ensures success of interconnect • Grows in importance as technology scales

  4. Previous Work • Significant research has been done in reliable and fault tolerant communication for large-scale interconnect • Fault-Tolerant NoC work is limited • Mostly flooding style algorithms • consume large amounts of energy • network congestion severely limits performance • no guaranteed delivery even in fault-free case • Stochastic algorithms have difficult-to-predict delays

  5. My Goal • Develop a non-stochastic (and hence lower power) means of fault-tolerant communication for NoC • Target Goals: • Consume relatively minor die area • Predictable communication behaviour • Limit the number of unnecessary packet duplicates • Target Applications: • 14x14 2d network for LDPC decoding • 4x4 2d network used in Neural Network-on-Chip

  6. A Two-Part Problem • Permanent faults • Nodes that • Improperly route packets • Don’t route at all • Can be detected at boot-time • Must be routed ‘around’ to avoid problems

  7. The Second Half of the Problem • Transient faults: • Transient faults occur between nodes, and represent bit-errors in the interconnect • As technology scales, interconnect can have more than 1000 bits • With a bit-error rate of 10-12 on a 16x16 torus, there will be a word-error every 2x10^6 cycles (< 6 ms) • To avoid costly source-dest retransmissions and their required storage, we must ensure no word-errors during transmit

  8. The more ‘permanent’ solution… • Since permanent faults can be detected statically, use a series of ‘bootstrap’ operations. • Routers are configured to ‘ignore’ certain neighbors if they don’t respond properly to queries. • Results in “X-Preferred-Y” routing

  9. X-Preferred-Y Routing • In nominal case use standard deadlock-free X-Y routing • If normal direction choice is unavailable, send packet in secondary direction of choice. • If both choices are unavailable, copy the message to all functional output channels. • Packets sent in X-directions are henceforth routed using Y-Preferred-X routing to prevent backtracking • All packets sent in ‘improper’ directions have their lifetime values reduced by 1. This prevents livelock situations and never-dying oscillations. (Lifetime empirically determined to be optimal near 2-3)

  10. What Did He Just Say? S D

  11. Notes on X-P-Y Routing • Does not handle all fault patterns. But most! • Requires many faulty nodes in local area to fail (and in a ‘C’ shape) • Deadlock Avoidance • The packet attempts to return to X-Preferred-Y routing as soon as possible • Can only have ‘misrouting’ packets in the local vicinity of faulty nodes • These faulty nodes make certain turns impossible

  12. Possible Transient Solutions • Use SECDED encoding/decoding for each packet hop • Verifies and recovers any errors at every hop • Requires two additional pipeline stages (normally only three total) • Use SECDED encoding for the entire packet at the NIC level, and decode only the destination during transmission • Two extra cycles required overall, but less tolerance

  13. A New Solution? • Use shadow registers (see Razor project) to detect soft-errors and crosstalk delay-induced timing errors • Secondary ‘extra’ set of registers resamples the link shortly after the primary clock fires • Discrepancies must be due to timing violation or bitflip • Reissue the flit, and allow two cycles for retransmission this time

  14. Evaluating the Solution - Area • The first question: Is it even feasible? • XPY routing consumes an additional 17% switch area in a 14x14 LDPC application • Only a 7% switch area penalty in 4x4 Neural Network application • Doesn’t affect clock rate • Bootstrap operations still included off chip

  15. Evaluating the Solution – Area (2) • Transient Solutions: • Full Node-Node SECDED has a 34% area penalty in the 14x14 app, and a 41% penalty in the 4x4 NN app • NIC-based SECDED increases area by only 20% and 17% respectively • Shadow registers require an extra flit of buffering at each port, imposing nearly a 70% area penalty • As technology scales, one flit is less and less of a problem

  16. Evaluation – Energy Consumption • Max replications of a given message is 3*lifetime copies • Finite and bounded number of duplicates • Less than similarly effective redundant random walk in worst case • In best case, no copies are generated at all • As energy is dominated by hop count, this results in nearly an order-of-magnitude lower energy consumption than the RRW method (which is an order of magnitude lower than stochastic flooding)

  17. Evaluation – Energy Consumption • Work still in progress on transient solutions • Energy consumption in NoC switches is dominated by buffer activity normally • SECDED blocks require significant switching activity • Additional bits (and hence larger buffers) required to send SECDED data • Shadow latches must be clocked as well • Still an open question

  18. Evaluation – Performance Impact • Due to a power outage, I cannot show you the effectiveness graph. BUT… • For 5% faults, more than 99% of all paths on the chip are routable • For small networks (4x4), over 50% of the chips have all valid paths routeable, even with 10% faults • Effectiveness of the algorithm reduces as network size increases – the longer average paths increase the odds of stumbling across a non-routable section

  19. Performance Impact – Transient Solutions • Normal packet latency in 14x14 LDPC application (3 stage pipeline) is roughly 16 cycles • Full SECDED increase this to nearly 27 cycles • NIC-based SECDED increases to only 18 cycles • Shadow register technique imposes little delay penalty, as error-free case (the majority of the cases) run at optimal speed • For small networks (2 stage pipeline), the average latency is much smaller, nearing four cycles. • Two cycle penalty imposed by NIC-based SECDED becomes much more significant • Shadow registers much faster for small networks, short pipelines

  20. Conclusions • X-Preferred-Y routing can be implemented in almost all designs and offers a nearly-deterministic and resource-friendly solution • NIC-based SECDED is an effective means of preventing soft errors but the performance impact is significant for small networks • Shadow registers especially useful for performance-critical apps • Get even better as technology scales • Area penalty minimized • Soft errors and Crosstalk problems increasing

  21. Further Work • On-chip bootstrap nodes to determine and reconfigure around faults • Energy consumption of transient solutions • Non-homogenous irregular topologies

More Related