200 likes | 386 Views
Router Designs for Elastic-Buffer On-Chip Networks. George Michelogiannakis William J. Dally Stanford University. Introduction. EB flow-control was recently proposed. Uses the channels as distributed FIFOs. EB routers are bufferless packet-switched routers.
E N D
Router Designs for Elastic-Buffer On-Chip Networks George Michelogiannakis William J. Dally Stanford University
Introduction • EB flow-control was recently proposed. • Uses the channels as distributed FIFOs. • EB routers are bufferless packet-switched routers. • They have the benefits of circuit-switched routers, without the overhead of setting up and tearing down circuits. • This work explores the EB router design space. • By evaluating three representative designs. SC09: Routers for EB NoCs
The EB Flow-control Idea Pipelined channel Channel as FIFO Elastic buffer Master-slave FF SC09: Routers for EB NoCs
How Elastic Buffer Channels Work • Ready/valid handshake between elastic buffers • Ready: At least one free storage slot • Valid: Non-empty (driving valid data) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 SC09: Routers for EB NoCs
Use EB Flow-Control Through the Router VC input-buffered router Three-slot output EB to cover for arbitration done one cycle in advance. VC & SW allocators removed. Per-output arbiters instead. Input buffer replaced by input EB LA routing also applicable to EB networks. EB router
Baseline Router - Issues • Issues constraining the clock cycle time: • Three-slot EB FSM too complicated: output EB implemented as FIFO. • Routing is performed serially with switch arbitration. Serially FIFO
Enhanced Two-Stage Router • Look-ahead routing to shorten the critical path. • Use two-slot EBs at output and for pipelining. • Flits are stored in the interm. EB and wait for a grant. • Decision to traverse switch made in the same cycle.
Enhanced Two-Stage Router – Sync Module • Synchronization module maintains alignment between flits and grants. • Contains an output port EB. • Stores the chosen output port of the current and any other packets in the router stage 1 and interm. EB. Maintains alignment between flits and grants.
Enhanced Two-Stage Router – Sync Module • When the current packet’s tail flit is departing: • Sync. module propagates the next output to the arbiters. • From the appropriate location. • Sync. module propagates an update to all outputs. • An output receiving an update from the input it is granting clocks the arbiter output regs at the next edge.
Single-Stage Router • Merges the two router stages to: • Reduce router latency. • Avoid pipelining overhead. SC09: Routers for EB NoCs
Evaluation Methodology • 45nm worst-case low-power commercial library. • Synopsys DC and Cadence Encounter. • 64-bit router datapath. 70% initial area utilization ratio. • Used a cycle-accurate network simulator. • We assume each router at its maximum post-P&R frequency, or all at the same frequency. • 8x8 2D mesh. 2mm-long wires. 1 cycle latency. • Constant packet size of 512 bits. • Averaged over a set of six traffic patterns. • Swept datapath width from 28 to 171 bits. SC09: Routers for EB NoCs
Placement and Routing Cycle Time • Enhanced two-stage has a 26% reduced cycle time compared to the single-stage, and 42% compared to the baseline two-stage. SC09: Routers for EB NoCs
Placement and Routing Energy per Bit • Baseline two-stage requires 9% less energy per bit compared to the single-stage, and 35% compared to the enhanced two-stage.
Placement and Routing Area • Single-stage occupies 30% less area than the enhanced two-stage and 44% less than the baseline two-stage.
Latency-Throughput, Max Frequencies. Latency increase: Enhanced: +1% Baseline: +46%
Latency-Throughput, Equal Frequencies. Latency increase: Enhanced: +34% Baseline: +32%
Which Router is the Optimal Choice? SC09: Routers for EB NoCs
Conclusion • Improved EB router designs can widen the gap compared to VC networks. • Makes EB look even more attractive. • EB routers are simple designs. Simple designs have numerous advantages. • A lot of the complexity of VC networks is ignored by some area and power models. • Overall compared to VC, 43% reduction in power per unit throughput, 67% reduction in cycle time and 22% throughput per unit area. SC09: Routers for EB NoCs
Questions? SC09: Routers for EB NoCs