450 likes | 671 Views
Augmenting FPGAs with Embedded Networks-on-Chip. Mohamed ABDELFATTAH Vaughn BETZ. Outline. 1. Why NoCs on FPGAs?. 2. Embedded NoCs. 3. Comparison Against Buses. 1. Why NoCs on FPGAs?. Motivation. Logic Blocks. Switch Blocks. Wires. Interconnect. 1. Why NoCs on FPGAs?.
E N D
Augmenting FPGAs with Embedded Networks-on-Chip Mohamed ABDELFATTAH Vaughn BETZ
Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3 Comparison Against Buses
1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks Wires Interconnect
1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks • Hard Blocks: • Memory • Multiplier • Processor Wires
1. Why NoCs on FPGAs? Motivation 1600 MHz Hard Interfaces DDR/PCIe .. Logic Blocks 800 MHz Switch Blocks Interconnect still the same • Hard Blocks: • Memory • Multiplier • Processor Wires 200 MHz
1. Why NoCs on FPGAs? Motivation 1600 MHz Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure DDR3 PHY and Controller PCIe Controller 800 MHz 200 MHz Gigabit Ethernet
1. Why NoCs on FPGAs? Motivation Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated DDR3 PHY and Controller PCIe Controller Gigabit Ethernet
Source: Google Earth Los Angeles Barcelona Keep the “roads”, but add “freeways”. Logic Cluster Hard Blocks
1. Why NoCs on FPGAs? FPGA with NoC NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated DDR3 PHY and Controller Router forwards data packet PCIe Controller Links Router moves data to local interconnect Routers Gigabit Ethernet
1. Why NoCs on FPGAs? FPGA with NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated • Abstraction favours modularity: • Parallel compilation • Partial reconfiguration • Multi-chip interconnect DDR3 PHY and Controller PCIe Controller • High bandwidth endpoints known • Pre-design NoC to requirements Gigabit Ethernet • NoC links are “re-usable” • NoC is heavily “pipelined” • NoC abstraction favors modularity
1. Why NoCs on FPGAs? FPGA with NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated • Abstraction favours modularity: • Parallel compilation • Partial reconfiguration • Multi-chip interconnect DDR3 PHY and Controller PCIe Controller Gigabit Ethernet • Latency-tolerant communication • NoC abstraction favors modularity
1. Why NoCs on FPGAs? Compute Acceleration GPU CPU • Maxeler • Geoscience (14x, 70x) • Financial analysis (5x, 163x) • Altera OpenCL • Video compression (3x, 114x) • Information filtering (5.5x)
1. Why NoCs on FPGAs? Compute Acceleration
1. Why NoCs on FPGAs? Compute Acceleration
1. Why NoCs on FPGAs? Compute Acceleration NoC
Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs Mixed NoCs Hard NoCs 3 Comparison Against Buses
2. Embedded NoCs Embedded NoCs = + “Soft” NoC Soft Routers Soft Links = + “Mixed” NoC Hard Routers Soft Links = + “Hard” NoC Hard Routers Hard Links
Methodology Soft Mixed Hard FPGA CAD Tools ASIC CAD Tools Area Speed Design Compiler Power? Power HSPICE Gate-level simulation Gate-level simulation Toggle rates
2. Embedded NoCs Mixed NoCs Logic blocks FPGA Programmable “soft” interconnect Router Baseline Router = + “Mixed” NoC Hard Routers Soft Links
2. Embedded NoCs Mixed NoCs FPGA Router = + “Mixed” NoC Hard Routers Soft Links 20
2. Embedded NoCs Mixed NoCs FPGA Router Special Feature Configurable topology Assumed a mesh Can form any topology
2. Embedded NoCs Hard NoCs Logic blocks FPGA Programmable “soft” interconnect Dedicated “hard” interconnect Router = + “Hard” NoC Hard Routers Hard Links 22
2. Embedded NoCs Hard NoCs FPGA Router = + “Hard” NoC Hard Routers Hard Links 23
2. Embedded NoCs Hard NoCs 1.1 V 0.9 V FPGA Router Special Feature Low-V mode Save 33% Dynamic Power ~15% slower = + “Hard” NoC Hard Routers Hard Links 24
3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Hard Mixed Soft 448 LBs 576 LBs ~12,500 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 730 – 940 MHz 166 MHz ~ 50 GB/s Speed ~ 10 GB/s Bisection BW
3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Provides ~50GB/s peak bisection bandwidth Very Cheap! Less than cost of 3 soft nodes Hard (Low-V) Mixed Soft 448 LBs 576 LBs ~12,500 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 730 – 940 MHz 166 MHz ~ 50 GB/s Speed ~ 10 GB/s Bisection BW
3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 123% How much is used for system-level communication? 17.4 W Largest Stratix-III device Typical FPGA Dynamic Power
3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 123% 15% NoC 17.4 W Typical FPGA Dynamic Power
3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 11% 123% 15% NoC 17.4 W Typical FPGA Dynamic Power
3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 7% 11% 123% 15% NoC 17.4 W Typical FPGA Dynamic Power
3. Area/Power Analysis Bandwidth in Perspective DDR3 Module 1 PCIe Module 2 14.6 GB/s Full theoretical BW 14.6 GB/s Cross whole chip! 17 GB/s 17 GB/s 17 GB/s 17 GB/s 14.6 GB/s Aggregate Bandwidth 126 GB/s 14.6 GB/s NoC Power Budget 3.5%
Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3 Comparison Against Buses Area/Power Efficiency Design Effort
4. Comparison DDR3: Qsys Bus vs. NoC Embedded NoC: 16 Nodes, hard routers & links Qsys bus: Build logical bus from fabric
4. Comparison DDR3: Qsys Bus vs. NoC “The Case for Embedded Networks-on-Chip on FPGAs” To appear in IEEE Micro Magazine (February) Embedded NoC: 16 Nodes, hard routers & links Qsys bus: Build logical bus from fabric
4. Comparison Design Effort close • Steps to close timing using Qsys FPGA
4. Comparison Design Effort far • Steps to close timing using Qsys FPGA
4. Comparison Design Effort far • Steps to close timing using Qsys FPGA Timing closure can be simplified with an embedded NoC
4. Comparison Area Comparison
4. Comparison Area Comparison
4. Comparison Area Comparison Entire NoC smaller than bus for 3 modules!
4. Comparison Area Comparison 1/8 Hard NoC BW used already less area for most systems
4. Comparison Power Comparison Hard NoC saves power for even the simplest systems
Why NoCs on FPGAs? 1 Big city needs freeways to handle traffic Embedded NoCs: Mixed & Hard 2 Power: 9-15X Area: 20-23X Speed: 5-6X • Area Budget for 64 nodes: ~1% • Power Budget for 100 GB/s: 3-7% 3 Comparison Against P2P/Buses • Raw efficiency close to simplest P2P links • NoC more efficient & lower design effort.
Thank You! www.eecg.utoronto.ca/~mohamed