1 / 19

Networks-on-Chip

Networks-on-Chip. ECE 111. Many-Core Processor Roadmap. Number of cores Quadrupling every 3 years. ‘05. ‘08. ‘11. ‘14. ‘02. Research. 64. 256. 1024. 4096. 16. Industry. 16. 64. 256. 1024. 4. Source: Agarwal, MIT. Intel’s 80 core. Tilera’s 100 core. Cisco’s 188 core.

tudor
Download Presentation

Networks-on-Chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Networks-on-Chip ECE 111

  2. Many-Core Processor Roadmap • Number of cores Quadrupling every 3 years ‘05 ‘08 ‘11 ‘14 ‘02 Research 64 256 1024 4096 16 Industry 16 64 256 1024 4 Source: Agarwal, MIT Intel’s 80 core Tilera’s 100 core Cisco’s 188 core

  3. SoC & Many-Core Convergence • Application-Specific Systems-on-Chips (SoCs) are evolving to look like many-core processors with custom hardware cores Source: Arvind, MIT General-purposeprocessors Application-specificprocessing cores On-chipmemory cores Structured on-chip interconnection network

  4. The Need for On-Chip Networks Compute or Memory Core Router Scalable communication Efficient use of wires Modular design A new way to organize and build VLSI systems

  5. Power and Performance Both Critical Compute or Memory Core Router For most applications, low energy consumption and high-performance are both 1st order design goals ! e.g. 28% of total power in Intel 80-core Teraflops chip is due to interconnection networks (routers + links); Network latency plays central role in performance

  6. Flits • Packets decomposed into “flits” • Basic data units • Flit size usually the same as “bus width”, say 128-bits • Head flit carries destination address information • Flow control • For on-chip networks, data loss is usually not acceptable • Credit-based flow control is used to ensure next-hop router has buffer size before a flit gets forwarded • Flits proceed forward like a “train” or “worm” through a path of routers • No need for store-and-forward, which leads to much lower latencies

  7. Using Virtual Channels to Avoid Deadlocks Coupling between channels and buffers causes head-of-line blocking as well as deadlocks Adds false dependencies between packets, leading to possibly deadlocks Limits channel utilization Increases latency Solution: Implement virtual channels (VCs) Source: Becker et al, Stanford

  8. VC Router Pipeline Route Computation (RC) Determine candidate output port(s) and VC(s) Can be precomputed at upstream router (lookahead routing) Virtual Channel Allocation (VA) Assign available output VCs to waiting packets at input VCs Switch Allocation (SA) Assign switch time slots to buffered flits Switch Traversal (ST) Send flits through crossbar switch to appropriate output Per packet Source: Becker et al, Stanford Per flit

  9. Allocation Basics Arbitration: Multiple requestors Single resource Request + grant vectors Allocation: Multiple requestors Multiple equivalent resources Request + grant matrices Matching: Each grant must satisfy a request Each requester gets at most one grant Each resource is granted at most once Source: Becker et al, Stanford

  10. Separable Allocators Matchings have at most one grant per row and per column Implement via to two phases of arbitration Column-wise and row-wise Perform in either order Arbiters in each stage are fully independent Fast and cheap But bad choices in first phase can prevent second stage from generating a good matching! Input-first: Output-first: Source: Becker et al, Stanford

  11. Oblivious Routing • Route packets without knowledge about the state (e.g. congestion) of the network • Objectives • Maximize worst-case and average-case throughput • Minimize latency (hop count)

  12. Routing Algorithm Affects Channel Loads Source: Seo et al, Purdue

  13. DOR • DOR = Dimension Ordered Routing (XY Routing) • Minimal latency, no path diversity; hence poor worst case throughput and average case throughput Destination Y Source X

  14. VAL • VAL = Valiant’s routing algorithm • Average latency twice as DOR, optimal worst-case throughput, poor average-case throughput Random Intermediate node Destination Y Source X

  15. ROMM • ROMM • Minimal latency, good average throughput, but poor worst-case throughput Intermediate node within bounding box Destination Bounding Box Y Source X

  16. O1TURN • O1TURN = Orthogonal 1 TURN (X-Y and Y-X routing with equal probability) • Minimal routing, optimal worst-case throughput, good average-case throughput Destination Y Source X

  17. Worst-Case Throughput Trends Source: Seo et al, Purdue

  18. Average Case Analysis Source: Seo et al, Purdue

  19. Comparison DOR VAL ROMM O1TURN Minimal hop count X X X Worst-case Throughput X X Average-case Throughput X X X

More Related