1 / 19

George Michelogiannakis William J. Dally Stanford University

Router Designs for Elastic-Buffer On-Chip Networks. George Michelogiannakis William J. Dally Stanford University. Introduction. EB flow-control was recently proposed. Uses the channels as distributed FIFOs. EB routers are bufferless packet-switched routers.

Download Presentation

George Michelogiannakis William J. Dally Stanford University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Router Designs for Elastic-Buffer On-Chip Networks George Michelogiannakis William J. Dally Stanford University

  2. Introduction • EB flow-control was recently proposed. • Uses the channels as distributed FIFOs. • EB routers are bufferless packet-switched routers. • They have the benefits of circuit-switched routers, without the overhead of setting up and tearing down circuits. • This work explores the EB router design space. • By evaluating three representative designs. SC09: Routers for EB NoCs

  3. The EB Flow-control Idea Pipelined channel Channel as FIFO Elastic buffer Master-slave FF SC09: Routers for EB NoCs

  4. How Elastic Buffer Channels Work • Ready/valid handshake between elastic buffers • Ready: At least one free storage slot • Valid: Non-empty (driving valid data) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 SC09: Routers for EB NoCs

  5. Use EB Flow-Control Through the Router VC input-buffered router Three-slot output EB to cover for arbitration done one cycle in advance. VC & SW allocators removed. Per-output arbiters instead. Input buffer replaced by input EB LA routing also applicable to EB networks. EB router

  6. Baseline Router - Issues • Issues constraining the clock cycle time: • Three-slot EB FSM too complicated: output EB implemented as FIFO. • Routing is performed serially with switch arbitration. Serially FIFO

  7. Enhanced Two-Stage Router • Look-ahead routing to shorten the critical path. • Use two-slot EBs at output and for pipelining. • Flits are stored in the interm. EB and wait for a grant. • Decision to traverse switch made in the same cycle.

  8. Enhanced Two-Stage Router – Sync Module • Synchronization module maintains alignment between flits and grants. • Contains an output port EB. • Stores the chosen output port of the current and any other packets in the router stage 1 and interm. EB. Maintains alignment between flits and grants.

  9. Enhanced Two-Stage Router – Sync Module • When the current packet’s tail flit is departing: • Sync. module propagates the next output to the arbiters. • From the appropriate location. • Sync. module propagates an update to all outputs. • An output receiving an update from the input it is granting clocks the arbiter output regs at the next edge.

  10. Single-Stage Router • Merges the two router stages to: • Reduce router latency. • Avoid pipelining overhead. SC09: Routers for EB NoCs

  11. Evaluation Methodology • 45nm worst-case low-power commercial library. • Synopsys DC and Cadence Encounter. • 64-bit router datapath. 70% initial area utilization ratio. • Used a cycle-accurate network simulator. • We assume each router at its maximum post-P&R frequency, or all at the same frequency. • 8x8 2D mesh. 2mm-long wires. 1 cycle latency. • Constant packet size of 512 bits. • Averaged over a set of six traffic patterns. • Swept datapath width from 28 to 171 bits. SC09: Routers for EB NoCs

  12. Placement and Routing Cycle Time • Enhanced two-stage has a 26% reduced cycle time compared to the single-stage, and 42% compared to the baseline two-stage. SC09: Routers for EB NoCs

  13. Placement and Routing Energy per Bit • Baseline two-stage requires 9% less energy per bit compared to the single-stage, and 35% compared to the enhanced two-stage.

  14. Placement and Routing Area • Single-stage occupies 30% less area than the enhanced two-stage and 44% less than the baseline two-stage.

  15. Latency-Throughput, Max Frequencies. Latency increase: Enhanced: +1% Baseline: +46%

  16. Latency-Throughput, Equal Frequencies. Latency increase: Enhanced: +34% Baseline: +32%

  17. Which Router is the Optimal Choice? SC09: Routers for EB NoCs

  18. Conclusion • Improved EB router designs can widen the gap compared to VC networks. • Makes EB look even more attractive. • EB routers are simple designs. Simple designs have numerous advantages. • A lot of the complexity of VC networks is ignored by some area and power models. • Overall compared to VC, 43% reduction in power per unit throughput, 67% reduction in cycle time and 22% throughput per unit area. SC09: Routers for EB NoCs

  19. Questions? SC09: Routers for EB NoCs

More Related