130 likes | 429 Views
Flattened Butterfly Topology for On-Chip Networks. John Kim, James Balfour, and William J. Dally Presented by Jun Pang. Motivation & Goal. Most on-chip networks (2D mesh): low-radix Pros: simple & short wires Cons: long network diameter & energy inefficiency (many hops) High-radix networks
E N D
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang
Motivation & Goal • Most on-chip networks (2D mesh): low-radix • Pros: simple & short wires • Cons: long network diameter & energy inefficiency (many hops) • High-radix networks • Intermediate routers: reduced a lot • Small latency & lower power • Goal: how does on-chip network use high-radix routers to reduce latency & energy
On-chip network • Plentiful bandwidth due to inexpensive wires while buffers are expensive • lower cost: from smaller distance • By reducing number of channels & buffers • Concentration: several terminal nodes share resources (routers) • Latency: • Reduce hop count at the expense of TS↑to get an overall reduced latency
On-chip Flattened Butterfly Fig. 3a • Topology • Radix=10(concentration factor:4; 3:d1; 3:d2) • 2 hops • Longer wires-> deeper buffers • Non-minimal global adaptive routing (UGAL) • Load balance & performance: path diversity • Routing minimally or non-minimally • Non-minimal: minimal Direction-ordered routing (prevent deadlock) • Only 2 VCs
Bypass Channels & Microarchitecture • Goal: reduce distance traveled by packets to reduce latency and energy • Two types of muxes • Input muxes: bypass inputs or direct inputs • Output muxes: direct outputs or bypass inputs • Yield arbiter to guarantee global fairness • If primary input is idle, non-primary input is chosen • Control packet: prevent starvation • Combination of minimal and non-minimal routing
Bypass Channels (continue) • Switch architecture • Minimal: simplified crossbar switch • Non-minimal: more complexity • Non-minimal with bypass channels: less complexity • Flow control & routing • Buffers for non-primary inputs • Separate buffers for destination of control packets • Modify UGAL to support bypass channels
Evaluation • Throughput: up to 50% throughput increase compared to concentrated mesh • Power: about 38% power reduction compared to mesh • Latency: about 28% latency reduction compared to mesh
Scalability • Lower channel increasing factor than hypercube • Three ways to scale • Concentrate factor • Dimension of the flattened butterfly • Hybrid approach • Future technology helps long wires • Increasing VCs will slightly reduce latency
Conclusion & Concerns • Flattened-butterfly: • interesting idea • Maximum distance between nodes=2 • Non-minimal routing to balance load • Bypassing channel to reduce latency • Lower latency and power, high throughput compared to mesh • Concerns: • High channel count? (bigger than mesh & torus) • Low channel utilization? (due to high channel) • Control complexity? (arbitration, control packets) • Bypass channel: good idea? (How about just use non-minimal or minimal?)