1.26k likes | 1.89k Views
On-Chip Communication: Networks on Chip (NoCs). Sudeep Pasricha Colorado State University CS/ECE 561 Fall 2011. Outline. Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples
E N D
On-Chip Communication: Networks on Chip (NoCs) Sudeep Pasricha Colorado State University CS/ECE 561 Fall 2011
Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems
Introduction • Evolution of on-chip communication architectures
registers ALU MEM NI Introduction • Network-on-chip (NoC) is a packet switched on-chip communication network designed using a layered methodology • NoCs use packets to route data from the source to the destination PE via a network fabric that consists of • network interfaces (NI) • switches (routers) • interconnection links (wires)
Introduction • NoCs are an attempt to scale down the concepts of largescale networks, and apply them to the system-on-chip (SoC) domain • NoC Properties • Reliable and predictable electrical and physical properties • Regular geometry that is scalable • Flexible QoS guarantees • Higher bandwidth • Reusable components • Buffers, arbiters, routers, protocol stack
Introduction • ISO/OSI network protocol stack model
Building Blocks: NI • Session-layer (P2P) interface with nodes • Back-end manages interface with switches Decoupling logic & synchronization Standard P2P Node protocol Proprietary link protocol Front end Backend Switches Node • NoC specific backend (layers 1-4) • Physical channel interface • Link-level protocol • Network-layer (packetization) • Transport layer (routing) • Standardized node interface @ session layer. Initiator vs. target distinction is blurred • Supported transactions (e.g. QoSread…) • Degree of parallelism • Session prot. control flow & negotiation
Building Blocks: Switch • Router or Switch: receives and forwards packets • Buffers have dual function • synchronization & queueing Input buffers & control flow Output buffers & control flow Allocator Arbiter Crossbar Data ports with control flow wires QoS & Routing
On-chip vs. Off Chip Networks • Cost • Off-chip: cost is channels, huge pads, expensive connectors, cables, optics • On-chip: cost is Si area and Power (storage!), wires are not infinite, but plentiful • Channel Characteristics • On-chip: wires are short latency is comparable with logic, huge amount of bandwidth, can put logic in links; LOT of uncertainty (process variations) and interference (e.g., noise) • Off-chip: wires are long link latency dominates, bandwidth is precious, links are strongly decoupled; • Workload • On-chip: non-homogeneous traffic – much is known • Off-chip: very little is known, and it may change • Design issues • On-chip: Must fit in floorplanning (die area constraint) • Off-chip: dictates board/rack organization — 9
NoC Concepts • Topology • How the nodes are connected together • Switching • Allocation of network resources (bandwidth, buffer capacity, …) to information flows • Routing • Path selection between a source and a destination node in a particular topology • Flow control • How the downstream node communicates forwarding availability to the upstream node — 10
Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems
NoC Topology • 1. Direct Topologies • each node has direct point-to-point link to a subset of other nodes in the system called neighboring nodes • as the number of nodes in the system increases, the total available communication bandwidth also increases • fundamental trade-off is between connectivity and cost
NoC Topology • Most direct network topologies have an orthogonal implementation, where nodes can be arranged in an n-dimensional orthogonal space • e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon • 2D mesh is most popular topology • all links have the same length • eases physical design • area grows linearly with the number of nodes • must be designed in such a way as to avoid traffic accumulating in the center of the mesh
NoC Topology • Torus topology, also called a k-ary n-cube, is an n-dimensional grid with k nodes in each dimension • k-ary 1-cube (1-D torus) is essentially a ring network with k nodes • limited scalability as performance decreases when more nodes • k-ary 2-cube (i.e., 2-D torus) topology is similar to a regular 2D mesh • except that nodes at the edges are connected to switches at the opposite edge via wrap-around channels • long end-around connections can, however, lead to excessive delays
NoC Topology • Folding torus topology overcomes the long link limitation of a 2-D torus • links have the same size • Meshes and tori can be extended by adding bypass links to increase performance at the cost of higher area
NoC Topology • Octagon topology is another example of a direct network • messages being sent between any 2 nodes require at most two hops • more octagons can be tiled together to accommodate larger designs • by using one of the nodes as a bridge node
NoC Topology • 2. Indirect Topologies • each node is connected to an external switch, and switches have point-to-point links to other switches • switches do not perform any information processing, and correspondingly nodes do not perform any packet switching • e.g. SPIN, crossbar topologies • Fat tree topology • nodes are connected only to the leaves of the tree • more links near root, where bandwidth requirements are higher
NoC Topology • k-ary n-fly butterfly topology • blocking multi-stage network – packets may be temporarily blocked or dropped in the network if contention occurs • kn nodes, and n stages of kn-1 k x k crossbars • e.g. 2-ary 3-fly butterfly network
NoC Topology • 3. Irregular or ad hoc network topologies • customized for an application • usually a mix of shared bus, direct, and indirect network topologies • e.g. reduced mesh, cluster-based hybrid topology
Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems
Messaging Units Data/Message • Data is transmitted based on a hierarchical data structuring mechanism • Messages packets flits phits • flits and phits are fixed size, packets and data may be variable sized • phit is a unit of data that is transferred on a link in a single cycle • typically, phit size = flit size • Switching: Determines “when” messaging units are moved Packets Flits: flow control digits head flit tail flit Phits: physical flow control digits type Dest Info Seq # misc
Circuit Switching Header Probe Acknowledgment Data • Hardware path setup by a routing header or probe • End-to-end acknowledgment initiates transfer at full hardware bandwidth • System is limited by signaling rate along the circuits • Routing, arbitration and switching overhead experienced once/message ts Link tr ts tsetup tdata Time Busy 22
Buffers for “request” tokens Circuit Switching Source end node Destination end node 23
Buffers for “request” tokens Circuit Switching Source end node Destination end node Request for circuit establishment (routing and arbitration is performed during this step) — 24
Buffers for “ack” tokens Circuit Switching Source end node Destination end node Request for circuit establishment Acknowledgment and circuit establishment (as token travels back to the source, connections are established) — 25
Circuit Switching Source end node Destination end node Request for circuit establishment Acknowledgment and circuit establishment Packet transport (neither routing nor arbitration is required) — 26
X Circuit Switching Source end node Destination end node HiRequest for circuit establishment Acknowledgment and circuit establishment Packet transport High contention, low utilization (r) low throughput — 27
Virtual Circuit Switching • Goal: Reduce cost associated with circuit switching • Multiple virtual circuits (channels) multiplexed on a single physical link • Allocate one buffer per virtual link • can be expensive due to the large number of shared buffers • Allocate one buffer per physical link • uses time division multiplexing (TDM) to statically schedule usage • less expensive routers packet flit type VC
o2 o3 o3 o4 o4 o1 o3 o1 o2 o1 o2 i1 - - - i1 - - - - - - i3 i1 - - - i1 - - - - - - - i1 - - i4 - - - - - - - - - - - - - - - - Use slots to • avoid contention • divide up bandwidth Virtual Circuit Switching Example 1 2 1 2 1 2 1 2 network interface router 1 router 2 network interface 1 2 1 2 1 2 1 2 3 3 3 3 4 4 2 1 2 1 1 network interface router 3 input 2 for router 1 is output 1 for router 2 1 2 1 2 1 o4 - - - - the input routed to the output at this slot — 29
Packet Switching • Packets are transmitted from source and make their way independently to receiver • possibly along different routes and with different delays • Zero start up time, followed by a variable delay due to contention in routers along packet path • QoS guarantees are harder to make • Three main packet switching scheme variants: • Store and Forward (SAF) switching • Virtual Cut Through (VCT) switching • Wormhole (WH) switching
Message Header Message Data Link tr tpacket Time Busy Packet Switching (Store and Forward) • Routing, arbitration, switching overheads experienced for each packet • Increased storage requirements at the nodes • Packetization and in-order delivery requirements • Alternative buffering schemes • Use of local processor memory • Central (to the switch) queues 31
Buffers for data packets Packet Switching (Store and Forward) Store Source end node Destination end node Packets are completely stored before any portion is forwarded — 32
Requirement: buffers must be sized to hold entire packet (MTU) Packet Switching (Store and Forward) Forward Store Source end node Destination end node Packets are completely stored before any portion is forwarded — 33
Packet Switching (Virtual Cut Through) • Messages cut-through to the next router when feasible • In the absence of blocking, messages are pipelined • Pipeline cycle time is the larger of intra-router and inter-router flow control delays • When the header is blocked, the complete message is buffered at a switch • High load behavior approaches that of SAF Packet Header Message Packet cuts through the Router tw Link tblocking tr ts Time Busy 34
Packet Switching (Virtual Cut Through) Routing Source end node Destination end node Portions of a packet may be forwarded (“cut-through”) to the next switch before the entire packet is stored at the current switch — 35
Packet Switching (Wormhole) • Messages are pipelined, but buffer space is on the order of a few flits • Small buffers + message pipelining small compact switches/routers • Supports variable sized messages • Messages cannot be interleaved over a channel: routing information is only associated with the header • Base Latency is equivalent to that of virtual cut-through Header Flit Link Single Flit tr ts twormhole Time Busy 36
Buffers for data packets Requirement: buffers must be sized to hold entire packet (MTU) Buffers for flits: packets can be larger than buffers Virtual Cut Through vs. Wormhole • Virtual Cut Through • Wormhole Source end node Destination end node Source end node Destination end node 37
Buffers for data packets Requirement: buffers must be sized to hold entire packet (MTU) Buffers for flits: packets can be larger than buffers Virtual Cut Through vs. Wormhole • Virtual Cut Through • Wormhole Busy Link Packet completely stored at the switch Source end node Destination end node Busy Link Packet stored along the path Source end node Destination end node 38
Comparison of Packet Switching Techniques • SAF packet switching and virtual cut-through • Consume network bandwidth proportional to network load • VCT behaves like wormhole at low loads and like SAF packet switching at high loads • High buffer costs • Wormhole switching • Provides low (unloaded) latency • Lower saturation point • Higher variance of message latency than SAF packet or VCT switching 39
Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems
Pipelined Router Microarchitecture Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Input buffers Output buffers Physical channel Physical channel Link Control CrossBar Routing Computation Link Control MUX DEMUX MUX DEMUX Input buffers Output buffers Physical channel Physical channel Link Control Link Control Routing Computation MUX DEMUX MUX DEMUX Switch Allocation VC Allocation LT & IB (Input Buffering) RC VCA SA ST & Output Buffering 42
Virtual Circuit Switching Router LT RC VCA SA ST Link Router LT RC VCA SA ST Link Router By using prediction/speculation, pipeline can be made more compact RC: Route Computation VCA: VC Allocation; SA: Switch Allocation ST: Switch Traversal; LT: Link Traversal 43
Router Power Dissipation • Used for a 6x6 mesh • 4 stage pipeline, > 3 GHz • Wormhole switching Source: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006 44
Head of Line (HOL) Blocking • Can be a problem in NoC routers! • limits the throughput of switches to 58.6% • Solution: Virtual Output Queues (VOQ) • input queuing strategy in which each input port maintains a separate queue for each output port — 45
Outline • Introduction • NoC Topology • Switching strategies • Router Microarchitecture • Routing algorithms • Flow control schemes • Clocking schemes • QoS • NoC Architecture Examples • Status and Open Problems
Routing Algorithms • Responsible for correctly and efficiently routing packets or circuits from the source to the destination • Path selection between a source and a destination node in a particular topology • Ensure load balancing • Latency minimization • Flexibility wrt faults in the network • Deadlock and livelock free solutions D S
Static vs. Dynamic Routing • Static routing: fixed paths are used to transfer data between a particular source and destination • does not take into account current state of the network • advantages of static routing: • easy to implement, since very little additional router logic is required • in-order packet delivery if single path is used • Dynamic routing: routing decisions are made according to the current state of the network • considering factors such as availability and load on links • path between source and destination may change over time • as traffic conditions and requirements of the application change • more resources needed to monitor state of the network and dynamically change routing paths • able to better distribute traffic in a network
03 13 23 02 12 22 01 21 11 00 10 20 Example: Dimension-order Routing • Static XY routing (commonly used): a deadlock-free shortest path routing which routes packets in the X-dimension first and then in the Y-dimension. • Used for tori and meshes • Destination address expressed as absolute coordinates 03 13 23 For torus, a preferred direction may have to be selected. For mesh, the preferred direction is the only valid direction. 02 12 22 01 11 21 +y 00 10 20 -x
Example: Adaptive Routing • Chooses shortest path to destination with minimum congestion • Path can change over time • Because congestion changes over time 03 13 23 02 12 22 01 11 21 00 10 20 — 50