1 / 31

High-Performance Networks for Dataflow Architectures

High-Performance Networks for Dataflow Architectures. Pravin Bhat Andrew Putnam. Overview. Motivation & Design Constraints Network design Performance Adaptive Routing Conclusion. Overview. Motivation & Design Constraints Network design Performance Adaptive Routing Conclusion.

chiku
Download Presentation

High-Performance Networks for Dataflow Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam

  2. Overview • Motivation & Design Constraints • Network design • Performance • Adaptive Routing • Conclusion

  3. Overview • Motivation & Design Constraints • Network design • Performance • Adaptive Routing • Conclusion

  4. Motivation • Signal delay on wires is more important than transistor switching speed • Seriously decreased reliability in future processes • Factory testing will not be possible • Expect 20% of transistors to be DOA • Expect 10% more to die over several months • Dataflow is an answer, but the network is currently a bottleneck

  5. Dataflow Characteristics • Unpredictable traffic • Cannot pre-allocate resources • Highly bursty traffic • Quick delivery of bursts is critical • Nodes are not guaranteed to consume messages • Potential for livelock & deadlock

  6. Overview • Motivation & Design Constraints • Network design • Performance • Adaptive Routing • Conclusion

  7. Network Requirements • High-Performance during bursts • Area efficient • Guarantee message delivery • Deadlock & Livelock free • Fault Tolerant • Regular 2-D physical structure

  8. Topology • On-chip - must be implementable in 2-D • Regular tiled structure suggests: • Grid • Torus • Hypercube • Fat Tree • Hypercube is difficult to route, scale • Fat Tree has a single point of failure

  9. Routing • Static routing does not provide essential fault tolerance • Use a modified Virtual Channel algorithm • VC guarantees deadlock free if nodes consume messages • Dynamically adaptive to handle transient faults & congestion • Initial studies used static routing

  10. Flow Control • Resource reservation not possible • Long-latency wires prohibit handshakes • Send messages assuming accept • Buffer just enough to allow receiver to send reject signal on subsequent clock cycle

  11. Deadlock-Free Operation • Nodes cannot always consume messages • Add a dedicated channel to and from memory • Adds 8% area overhead • Rotate stalled operands out of PEs to ensure forward progress • Send first operand back at a faster rate to avoid livelock

  12. Overview • Motivation & Design Constraints • Network design • Performance • Adaptive Routing • Conclusion

  13. Performance • Ran network-centric simulations • 20 billion instructions • Spec2000, Splash2, and Dataflow benchmarks • Goal is to find optimum balance of: • Number of Virtual Channels • Queue Length • Link Bandwidth • Packets per message

  14. ASIC Model • Performance must be balanced with area • Developed RTL model of WaveScalar network architecture • 90 nm process ASIC standard cell library • Timing per link: • Grid links: 2.76 ns • Torus links: 6.16 ns • Network switch is 11.6% of chip area

  15. Overview • Motivation & Design Constraints • Network design • Performance • Adaptive Routing • Conclusion

  16. Virtual Channels Flow Control • In hardware only Head-of-Queue can be dequeued in one clock cycle • If the first message in a queue is blocked then every message behind it is blocked • The network utilization suffers due to idle links

  17. Virtual Channels Flow Channel • Virtual Channels – several small queues instead of one long queue • Decouples buffer resources from link resources • Increase network throughput by increasing link usage

  18. Dimension Order Routing • Old WaveScalar Routing Protocol • Network topology is a static grid • Packets first travel to the correct x-coordinate and then to the correct y-coordinate • Low network utilization from not using all available paths • Not fault tolerant

  19. Adaptive Routing • Progressively chooses longer routes instead of waiting for an unavailable resource • High Network Utilization • Fault tolerant • Can cause deadlock

  20. Deadlock Free Adaptive Routing • Some Virtual Channels are reserved for Dimension Order Routing, rest used for Adaptive routing • Every time a packet is routed in the wrong direction the Dimension Reversal count incremented • No packet is allowed to wait in a virtual channel with a packet that has a lower Dimension reversal count • Mathematically proven to be deadlock free.

  21. Conclusion • Best performance per area with: • 2 Virtual Channels • 2 Links • 2-4 entries per queue • Torus Topology • Adaptive Routing • Dataflow chip networks can be high-performance at reasonable area

More Related