720 likes | 736 Views
OpenSMART is a generator for single-cycle, multi-hop NoCs, addressing challenges in scalability, flexibility, and design-cost. Features low latency network and dynamic bypass of routers.
E N D
OpenSMART: An Opensource Single- cycle Multi-hop NoC Generator Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) OpenSMART (https://tinyurl.com/Get-OpenSMART) hyoukjun@gatech.edu Nov 12, 2017
Challenges for NoCs • Scalability - Supporting many-IP heterogeneous system - Lower latency - Lower area & energy • Flexibility - Support diverse connectivity for custom heterogeneous system - Support diverse latency/throughput requirements • Design-cost - Automating the design of high-performance, low-energy NoCs - Lowering design/verification costs of SoCs with NoCs 3
OpenSMART OpenSMART 4
SMART NoC • Single-cycle Multi-hop Asynchronous Repeated Traversal SSR (SMART Setup Request)SSR (SMART Setup Request) SSR (SMART Setup Request) D S SMART: achieve the performance of dedicated connections over a network of shared links Krishna et al, HPCA 2013 Chen et al, DATE 2013 Krishna et al, IEEE Micro Top Picks 2014, HPCmax 1-cycle (no other traffic) 5
Features of SMART • Low latency network - Dynamic bypass of intermediate routers between any two routers - Limit: HPCmax (hops per cycle max), maximum number of “hops” that the underlying wire allows the flit to traverse within a clock cycle • Separate control path - HPCmax bits from every router along each direction - Arbitration of multiple bypass requests on the same link - No ACK required 6
How to Get the Source Code • Go to Synergy lab homepage (synergy.ece.gatech.edu) 7
How to Get the Source Code • In the released tools tap, click OPENSMART 8
How to Get the Source Code • You will be forwarded to access request form page. • Please fill and submit the form, then you will get a link to OpenSMART repository 9
How to Get the Source Code • Using the link, you can access to the repository 10
Source Tree (Under Backend/BSV) • Frontend: Configuration Parser (under development) • Backend/BSV: BSV implementation (Main files) - src: Building blocks - Network.bsv : Connectivity configuration (default: Mesh) - Types/Types.bsv : Topology (Number of routers), VC, Routing algorithm, SMART (HPCmax) configuration - lib: Fundamental BSV libraries (FIFOs and CReg) - testbenches: Include synthetic traffic-based simulation • Backend/Chisel: Chisel implementation (Router only) 11
How to Specify a topology • In Backend/BSV/src/Network.bsv ... 64 for(Integer i=0; i < meshHeight; i++) begin 65 for(Integer j=0; j < meshWidth -1; j++) begin Interconnecting all the data/credit West -> East links in a mesh network 66 mkConnection(routers[i][j].dataLInks[East].getFlit, routers[i][j+1].dataLinks[West].putFlit) 67 mkConnection(routers[i][j].controlLInks[East].putCredit, routers[i][j+1].controlLinks[West].getCredit) 68 end 69 end Can change connectivity using “mkConnection” with different routers/links Automation of this process is under development 13
How to Configure OpenSMART • In Backend/BSV/Types/types.bsv 1 typedef 100000 Benchmark Cycle 2 typedef 32 DataSz Flit data size 3 typedef 4 NumFlitsPerDataMessage Determine number of flits in a packet 4 5 typedef 6 UserHPCMax Determine HPCmax (SMART feature) 6 typedef 8 MeshWidth Mesh dimension (determines number of routers) 7 typedef 8 MeshHeight 8 9 typedef 4 NumUserVCs Determine number of VCs 10 11 currentRoutingAlgorithm = XY_; Determine routing algorithm 15
OpenSMART Building Blocks Input buffer + Input VC arbitration Output VC selection + Output port arbitration + Credit management Switching (via crossbar) + Routing calculation SSR communication & Arbitration + Bypass flag 17
OpenSMART Router Arbiter Flit Size Number of VCs/VC Depth 18
OpenSMART Router Arbiter 19
OpenSMART Router Routing Algorithm 20
OpenSMART Router (SMART) HPCmax Prioritization by distance -> SSR from a nearer router gets the higher priority (Local (distance = 0) has the highest prirority) SSR Prioritization 21
Walk-through Example • Router r4 sends a flit to router r7 • Router r5 sends a flit to router r7 • HPCmax = 3 Cycle 1: Multi-hop Bypass Cycle 0: SSR Send SSR (SMART Setup Request) 110 110 110 100 100 Winner SMART Unit in r5 22
How to Run OpenSMART • In Backend/BSV/ > ./OpenSMART –c Compile synthetic traffic- based Simulation > ./OpenSMART –r Run compiled simulation > ./OpenSMART –v Generate Verilog code > ./OpenSMART –clean Clean up build files 24
How to Run OpenSMART • Simulation Compilation Print-out Messages 25
How to Run OpenSMART • Simulation Print-out Messages Simulation Ticks: every 10,000 cycles Indicates if the simulation is alive or not 26
How to Run OpenSMART • Simulation Print-out Messages Send/Receive counts for every router Summary of the total statistics 27
How to Run OpenSMART • Generating Verilog files Similar print-out messages as simulation compilation 28
How to Run OpenSMART • Generating Verilog files Verilog files are generated in ./Verilog 29
OpenSMART Thank you! OpenSMART(https://tinyurl.com/Get-OpenSMART) 30
Outline • Motivation and Background • OpenSMART - Getting source code - Changing topology - Modifying other configurations - Building blocks • Conclusions 32
Outline • Motivation and Background • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • OpenSMART: User guide - Source tree - Commands • Conclusions 33
Source Tree (Under Backend/BSV) • Frontend: Configuration Parser (under development) • Backend/BSV: BSV implementation (Main files) - src: Building blocks - Network.bsv : Connectivity configuration (default: Mesh) - Types/Types.bsv : Topology (Number of routers), VC, Routing algorithm, SMART (HPCmax) configuration - lib: Fundamental BSV libraries (FIFOs and CReg) - testbenches: Include synthetic traffic-based simulation • Backend/Chisel: Chisel implementation (Router only) 34
Walk-through Example 1 • Router r4 sends a flit to router r7 • HPCmax = 3 Cycle 0: SSR Send Cycle 1: Multi-hop Bypass bypass, bypass, stop SSR (SMART Setup Request) 110 110 110 35
Latency 5X 4X (b) Bit-complement (a) Uniform Random 36
Energy Consumption Repeaters require less energy than clocked latches 37
HPCmax (a) HPCmax on ASIC (b) HPCmax on FPGA 38
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 39
Router Area 40
Router Power Number of Ports (a) ASIC (b) FPGA 41
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 43
Conclusion • NoCs are crucial components to support many- IP heterogeneous systems – Providing connectivity while satisfying their diverse requrements. • OpenSMART provides automatic generation of NoCs for many-IP heterogeneous systems – Supports recent low latency SMART NoC as well as highly-optimized 1-cycle routers – Written in high-level HDLs 44
Announcement • OpenSMART contributes the open-source hardware ecosystem! • Source code will be available in May 2017 • Please sign up via our webpage to request the source code http://synergy.ece.gatech.edu/tools/opensmart/ Thank you! 45
Is 1-cycle Network Possible? Yes Is wire fast enough to support 1-cycle network? • Wire traversal length within 1ns (1Ghz): 10-16mm • Wire delay over technology: constant • Chip dimension: remain similar (~20mm) • Clock frequency: remain similar (1~3GHz) • Tile dimension: decrease over technology within 1-2 cycles at 1GHz even if technology scales On-chip wires are fast enough to transmit across the chip ~20mm ~20mm ~20mm ~20mm 46
Hardware Development Cost source: Todd Austin, Micro-49 keynote • Low cost challenge 47
Many-IP Heterogeneous System Network-on-Chip (NoC) • Scalability challenge • Flexibility challenge 48
Diverse System Requirements Throughput Critical Latency Critical source: MNIST, Engadget, TheStack 49
OpenSMART Router (1cycle) Cycle 0 Cycle 1 50