1 / 97

On-Chip Communication: Networks on Chip (NoCs)

On-Chip Communication: Networks on Chip (NoCs). Sudeep Pasricha Colorado State University CS/ECE 561 Fall 2011. Outline. Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples

irma
Download Presentation

On-Chip Communication: Networks on Chip (NoCs)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-Chip Communication: Networks on Chip (NoCs) Sudeep Pasricha Colorado State University CS/ECE 561 Fall 2011

  2. Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems

  3. Introduction • Evolution of on-chip communication architectures

  4. registers ALU MEM NI Introduction • Network-on-chip (NoC) is a packet switched on-chip communication network designed using a layered methodology • NoCs use packets to route data from the source to the destination PE via a network fabric that consists of • network interfaces (NI) • switches (routers) • interconnection links (wires)

  5. Introduction • NoCs are an attempt to scale down the concepts of largescale networks, and apply them to the system-on-chip (SoC) domain • NoC Properties • Reliable and predictable electrical and physical properties • Regular geometry that is scalable • Flexible QoS guarantees • Higher bandwidth • Reusable components • Buffers, arbiters, routers, protocol stack

  6. Introduction • ISO/OSI network protocol stack model

  7. Building Blocks: NI • Session-layer (P2P) interface with nodes • Back-end manages interface with switches Decoupling logic & synchronization Standard P2P Node protocol Proprietary link protocol Front end Backend Switches Node • NoC specific backend (layers 1-4) • Physical channel interface • Link-level protocol • Network-layer (packetization) • Transport layer (routing) • Standardized node interface @ session layer. Initiator vs. target distinction is blurred • Supported transactions (e.g. QoSread…) • Degree of parallelism • Session prot. control flow & negotiation

  8. Building Blocks: Switch • Router or Switch: receives and forwards packets • Buffers have dual function •  synchronization & queueing Input buffers & control flow Output buffers & control flow Allocator Arbiter Crossbar Data ports with control flow wires QoS & Routing

  9. On-chip vs. Off Chip Networks • Cost • Off-chip: cost is channels, huge pads, expensive connectors, cables, optics • On-chip: cost is Si area and Power (storage!), wires are not infinite, but plentiful • Channel Characteristics • On-chip: wires are short  latency is comparable with logic, huge amount of bandwidth, can put logic in links; LOT of uncertainty (process variations) and interference (e.g., noise) • Off-chip: wires are long  link latency dominates, bandwidth is precious, links are strongly decoupled; • Workload • On-chip: non-homogeneous traffic – much is known • Off-chip: very little is known, and it may change • Design issues • On-chip: Must fit in floorplanning (die area constraint) • Off-chip: dictates board/rack organization — 9

  10. NoC Concepts • Topology • How the nodes are connected together • Switching • Allocation of network resources (bandwidth, buffer capacity, …) to information flows • Routing • Path selection between a source and a destination node in a particular topology • Flow control • How the downstream node communicates forwarding availability to the upstream node — 10

  11. Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems

  12. NoC Topology • 1. Direct Topologies • each node has direct point-to-point link to a subset of other nodes in the system called neighboring nodes • as the number of nodes in the system increases, the total available communication bandwidth also increases • fundamental trade-off is between connectivity and cost

  13. NoC Topology • Most direct network topologies have an orthogonal implementation, where nodes can be arranged in an n-dimensional orthogonal space • e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon • 2D mesh is most popular topology • all links have the same length • eases physical design • area grows linearly with the number of nodes • must be designed in such a way as to avoid traffic accumulating in the center of the mesh

  14. NoC Topology • Torus topology, also called a k-ary n-cube, is an n-dimensional grid with k nodes in each dimension • k-ary 1-cube (1-D torus) is essentially a ring network with k nodes • limited scalability as performance decreases when more nodes • k-ary 2-cube (i.e., 2-D torus) topology is similar to a regular 2D mesh • except that nodes at the edges are connected to switches at the opposite edge via wrap-around channels • long end-around connections can, however, lead to excessive delays

  15. NoC Topology • Folding torus topology overcomes the long link limitation of a 2-D torus • links have the same size • Meshes and tori can be extended by adding bypass links to increase performance at the cost of higher area

  16. NoC Topology • Octagon topology is another example of a direct network • messages being sent between any 2 nodes require at most two hops • more octagons can be tiled together to accommodate larger designs • by using one of the nodes as a bridge node

  17. NoC Topology • 2. Indirect Topologies • each node is connected to an external switch, and switches have point-to-point links to other switches • switches do not perform any information processing, and correspondingly nodes do not perform any packet switching • e.g. SPIN, crossbar topologies • Fat tree topology • nodes are connected only to the leaves of the tree • more links near root, where bandwidth requirements are higher

  18. NoC Topology • k-ary n-fly butterfly topology • blocking multi-stage network – packets may be temporarily blocked or dropped in the network if contention occurs • kn nodes, and n stages of kn-1 k x k crossbars • e.g. 2-ary 3-fly butterfly network

  19. NoC Topology • 3. Irregular or ad hoc network topologies • customized for an application • usually a mix of shared bus, direct, and indirect network topologies • e.g. reduced mesh, cluster-based hybrid topology

  20. Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems

  21. Messaging Units Data/Message • Data is transmitted based on a hierarchical data structuring mechanism • Messages  packets  flits  phits • flits and phits are fixed size, packets and data may be variable sized • phit is a unit of data that is transferred on a link in a single cycle • typically, phit size = flit size • Switching: Determines “when” messaging units are moved Packets Flits: flow control digits head flit tail flit Phits: physical flow control digits type Dest Info Seq # misc

  22. Circuit Switching Header Probe Acknowledgment Data • Hardware path setup by a routing header or probe • End-to-end acknowledgment initiates transfer at full hardware bandwidth • System is limited by signaling rate along the circuits • Routing, arbitration and switching overhead experienced once/message ts Link tr ts tsetup tdata Time Busy 22

  23. Buffers for “request” tokens Circuit Switching Source end node Destination end node 23

  24. Buffers for “request” tokens Circuit Switching Source end node Destination end node Request for circuit establishment (routing and arbitration is performed during this step) — 24

  25. Buffers for “ack” tokens Circuit Switching Source end node Destination end node Request for circuit establishment Acknowledgment and circuit establishment (as token travels back to the source, connections are established) — 25

  26. Circuit Switching Source end node Destination end node Request for circuit establishment Acknowledgment and circuit establishment Packet transport (neither routing nor arbitration is required) — 26

  27. X Circuit Switching Source end node Destination end node HiRequest for circuit establishment Acknowledgment and circuit establishment Packet transport High contention, low utilization (r)  low throughput — 27

  28. Virtual Circuit Switching • Goal: Reduce cost associated with circuit switching • Multiple virtual circuits (channels) multiplexed on a single physical link • Allocate one buffer per virtual link • can be expensive due to the large number of shared buffers • Allocate one buffer per physical link • uses time division multiplexing (TDM) to statically schedule usage • less expensive routers packet flit type VC

  29. o2 o3 o3 o4 o4 o1 o3 o1 o2 o1 o2 i1 - - - i1 - - - - - - i3 i1 - - - i1 - - - - - - - i1 - - i4 - - - - - - - - - - - - - - - - Use slots to • avoid contention • divide up bandwidth Virtual Circuit Switching Example 1 2 1 2 1 2 1 2 network interface router 1 router 2 network interface 1 2 1 2 1 2 1 2 3 3 3 3 4 4 2 1 2 1 1 network interface router 3 input 2 for router 1 is output 1 for router 2 1 2 1 2 1 o4 - - - - the input routed to the output at this slot — 29

  30. Packet Switching • Packets are transmitted from source and make their way independently to receiver • possibly along different routes and with different delays • Zero start up time, followed by a variable delay due to contention in routers along packet path • QoS guarantees are harder to make • Three main packet switching scheme variants: • Store and Forward (SAF) switching • Virtual Cut Through (VCT) switching • Wormhole (WH) switching

  31. Message Header Message Data Link tr tpacket Time Busy Packet Switching (Store and Forward) • Routing, arbitration, switching overheads experienced for each packet • Increased storage requirements at the nodes • Packetization and in-order delivery requirements • Alternative buffering schemes • Use of local processor memory • Central (to the switch) queues 31

  32. Buffers for data packets Packet Switching (Store and Forward) Store Source end node Destination end node Packets are completely stored before any portion is forwarded — 32

  33. Requirement: buffers must be sized to hold entire packet (MTU) Packet Switching (Store and Forward) Forward Store Source end node Destination end node Packets are completely stored before any portion is forwarded — 33

  34. Packet Switching (Virtual Cut Through) • Messages cut-through to the next router when feasible • In the absence of blocking, messages are pipelined • Pipeline cycle time is the larger of intra-router and inter-router flow control delays • When the header is blocked, the complete message is buffered at a switch • High load behavior approaches that of SAF Packet Header Message Packet cuts through the Router tw Link tblocking tr ts Time Busy 34

  35. Packet Switching (Virtual Cut Through) Routing Source end node Destination end node Portions of a packet may be forwarded (“cut-through”) to the next switch before the entire packet is stored at the current switch — 35

  36. Packet Switching (Wormhole) • Messages are pipelined, but buffer space is on the order of a few flits • Small buffers + message pipelining  small compact switches/routers • Supports variable sized messages • Messages cannot be interleaved over a channel: routing information is only associated with the header • Base Latency is equivalent to that of virtual cut-through Header Flit Link Single Flit tr ts twormhole Time Busy 36

  37. Buffers for data packets Requirement: buffers must be sized to hold entire packet (MTU) Buffers for flits: packets can be larger than buffers Virtual Cut Through vs. Wormhole • Virtual Cut Through • Wormhole Source end node Destination end node Source end node Destination end node 37

  38. Buffers for data packets Requirement: buffers must be sized to hold entire packet (MTU) Buffers for flits: packets can be larger than buffers Virtual Cut Through vs. Wormhole • Virtual Cut Through • Wormhole Busy Link Packet completely stored at the switch Source end node Destination end node Busy Link Packet stored along the path Source end node Destination end node 38

  39. Comparison of Packet Switching Techniques • SAF packet switching and virtual cut-through • Consume network bandwidth proportional to network load • VCT behaves like wormhole at low loads and like SAF packet switching at high loads • High buffer costs • Wormhole switching • Provides low (unloaded) latency • Lower saturation point • Higher variance of message latency than SAF packet or VCT switching 39

  40. Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems

  41. Generic Router Architecture 41

  42. Pipelined Router Microarchitecture Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Input buffers Output buffers Physical channel Physical channel Link Control CrossBar Routing Computation Link Control MUX DEMUX MUX DEMUX Input buffers Output buffers Physical channel Physical channel Link Control Link Control Routing Computation MUX DEMUX MUX DEMUX Switch Allocation VC Allocation LT & IB (Input Buffering) RC VCA SA ST & Output Buffering 42

  43. Virtual Circuit Switching Router LT RC VCA SA ST Link Router LT RC VCA SA ST Link Router By using prediction/speculation, pipeline can be made more compact RC: Route Computation VCA: VC Allocation; SA: Switch Allocation ST: Switch Traversal; LT: Link Traversal 43

  44. Router Power Dissipation • Used for a 6x6 mesh • 4 stage pipeline, > 3 GHz • Wormhole switching Source: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006 44

  45. Head of Line (HOL) Blocking • Can be a problem in NoC routers! • limits the throughput of switches to 58.6% • Solution: Virtual Output Queues (VOQ) • input queuing strategy in which each input port maintains a separate queue for each output port — 45

  46. Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems

  47. Routing Algorithms • Responsible for correctly and efficiently routing packets or circuits from the source to the destination • Path selection between a source and a destination node in a particular topology • Ensure load balancing • Latency minimization • Flexibility wrt faults in the network • Deadlock and livelock free solutions D S

  48. Static vs. Dynamic Routing • Static routing: fixed paths are used to transfer data between a particular source and destination • does not take into account current state of the network • advantages of static routing: • easy to implement, since very little additional router logic is required • in-order packet delivery if single path is used • Dynamic routing: routing decisions are made according to the current state of the network • considering factors such as availability and load on links • path between source and destination may change over time • as traffic conditions and requirements of the application change • more resources needed to monitor state of the network and dynamically change routing paths • able to better distribute traffic in a network

  49. 03 13 23 02 12 22 01 21 11 00 10 20 Example: Dimension-order Routing • Static XY routing (commonly used): a deadlock-free shortest path routing which routes packets in the X-dimension first and then in the Y-dimension. • Used for tori and meshes • Destination address expressed as absolute coordinates 03 13 23 For torus, a preferred direction may have to be selected. For mesh, the preferred direction is the only valid direction. 02 12 22 01 11 21 +y 00 10 20 -x

  50. Example: Adaptive Routing • Chooses shortest path to destination with minimum congestion • Path can change over time • Because congestion changes over time 03 13 23 02 12 22 01 11 21 00 10 20 — 50

More Related