660 likes | 848 Views
Routing. Z. Lu / A.Jantsch / I. Sander Dally and Towles, Chapter 8, 9, 10, 11. Overview. Interconnect Network Introduction. Topology (Regular, irregular). Deadlock, Livelock. Router Architecture (pipelined, classic) . Routing (algorithms, mechanics). Network Interface
E N D
Routing Z. Lu / A.Jantsch / I. Sander Dally and Towles, Chapter 8, 9, 10, 11
Overview Interconnect Network Introduction Topology (Regular, irregular) Deadlock, Livelock Router Architecture (pipelined, classic) Routing (algorithms, mechanics) Network Interface (message passing, shared memory) Flow Control (Circuit, packet switching (SAF, wormhole, virtual channel) Performance Analysis and QoS Implementation Evaluation Summary Concepts SoC Architecture
Outline • A motivating example: different routing algorithms make difference • Routing algorithms • Deterministic routing • Oblivious routing • Adaptive routing • Routing mechanics • Table-based: source-table and node-table • Algorithmic routing SoC Architecture
Routing • Objectives of a routing algorithm: • Find a path from a source node to a destination node on a given topology • Reduce the number of hops and overall latency, saving energy. • Balance the load of network channels, maximizing throughput. • Going for minimal routes is not always good. SoC Architecture
0 1 2 3 4 5 6 7 Shanghai Stockholm An Introductory Example • There are only two directions a packet can take: clockwise and counterclockwise • There are plenty of possible routing algorithms SoC Architecture
0 1 2 3 4 5 6 7 What about performance? • Performance of algorithm depends on Toplogy and traffic pattern • For further investigation, we assume the Tornado traffic pattern • Every node i sends a packet to node (i+3) mod 8 • 0 -> 3, 1->4, ... 5->8, 6->1, 7->2 • Lower bound: (8 packets, 3 hops, 16 channels) Is the bound tight? SoC Architecture
0 1 2 3 4 5 6 7 Shanghai Stockholm An Introductory Example • Greedy: Always send a packet in the shortest direction. If the distance is the same in both directions, pick a direction randomly. SoC Architecture
0 1 2 3 4 5 6 7 Performance of Greedy Algorithm • No traffic is routed counterclockwise => Bad utilization of channels • Thus and = 3 SoC Architecture
0 1 2 3 4 5 6 7 Shanghai Stockholm An Introductory Example • Uniform Random: Send a packet randomly (p=0.5) either clockwise or counterclockwise SoC Architecture
0 1 2 3 4 5 6 7 Performance of Uniform Random Algorithm • Half of the traffic is going clockwise and half of the traffic counterclockwise • Bottleneck counterclockwise traffic, where half of the traffic traverses five links • and thus p=0.5 p=0.5 SoC Architecture
0 1 2 3 4 5 6 7 Shanghai Stockholm An Introductory Example • Weighted Random: Randomly pick a direction for each packet, but weight the short direction with probability 1-p/8, and the long direction with p/8 where p is the minimum distance between source and destination SoC Architecture
0 1 2 3 4 5 6 7 Performance ofWeighted Random Algorithm • 5/8 of the traffic is going clockwise (3 links) and • 3/8 of the traffic goes counterclockwise (5 links) • In both directions thus p=0.625 p=0.375 SoC Architecture
0 1 2 3 4 5 6 7 Shanghai Stockholm An Introductory Example • Adaptive: Send the packet in the direction for which the local channel has the lowest load. • This can be done by either • measuring the lentgh of the queue in this direction • recording how many packets a channel has transmitted over the last T slots SoC Architecture
0 1 2 3 4 5 6 7 Performance of Adaptive Routing • Adaptive routing will in a steady state match the perfect load. • Thus it will in theory allow in both directions SoC Architecture
What is the best Algorithm? • Lower bound on channel load: • Upper bound on throughput: SoC Architecture
Let’s analyze the algorithms • 4 Algorithms • Greedy • Random • Weighted random • Adaptive • Properties • In terms of adaptivity: deterministic, oblivious, adaptive • In terms of shortest path: minimal, non-minimal How would you classify the algrothms according to the properties? SoC Architecture
Taxonomy of Routing Algorithms • Deterministic algorithms always choose the same path between two nodes • Easy to implement and to make deadlock free • Do not use path diversity and thus bad on load balancing • The tornado example: The Greedy algorithm SoC Architecture
Taxonomy of Routing Algorithms • Oblivious (Oblivious: unconscious) algorithms always choose a route without knowing about the state of the network • Random algorithms that do not consider the network state, are oblivious algorithms • Include deterministic routing algorithms as a subset • The tornado example: Random, Weighted random SoC Architecture
Taxonomy of Routing Algorithms • Adaptive algorithms use information about the state of the network to make routing decisions • Length of queues • Historical channel load SoC Architecture
Taxonomy of Routing Algorithms • Minimal algorithms only consider minimal routes (shortest path) • The tornado example: the greedy algorithm • Non-Minimal algorithms allow even non-minimal routes • The tornado example: Random and weighted random, adaptive SoC Architecture
Deterministic Routing • A packet from a source node s to a destination node dis always sent over exactly the same route. • Advantages • Simple and inexpensive to implement • Usual deterministic routing is minimal, which leads to short path length • Packets arrive in order • Disadvantage • Does not exploit path diversity and can create large load imbalances SoC Architecture
Deterministic Routing Example:Destination-Tag Routing in Butterfly Networks Routed to destination 5, 101 Routed to destination 11, 23 Binary tag=101 (down,up,down) (3-digit radix-2) Quaternary tag=23 (2-digit radix-4) Destination-Tag Routing depends on the destination address only (not on source) SoC Architecture
Deterministic Routing Example:Dimension-Order Routing in Tori and Meshes • Also called e-cube routing • Digits of destination address are used to one at a time to direct the routing. • Since a torus can be traversed in clockwise or counterclockwise direction, the preferred direction in each dimension have to be calculated first • The packet is routed along these directions { +x / -x, +y / -y } until it reaches its destination SoC Architecture
Dimension-Order RoutingExample • Compute the relative address ∆ifor each digit i • Here: ∆ = (2, -1) • Thus preferred direction is: D = (+1, -1) • First the packet is routed along the first dimension (-x) and then along the second dimension (+y) SoC Architecture
Dimension-Order RoutingExample • What happens, if the packet shall be routed from 03 to 32? • The packet now needs three hops in both +yand -ydirection • In order to balance the load a random decision should be made SoC Architecture
Oblivious Routing • Packets are routed without regard of the state of the network • Main tradeoff is between • Locality (short path lengths) • Load balance • Oblivious routing is easy to analyze due to linear channel load functions SoC Architecture
Valiant’s Randomized Routing Algorithm • Valiant’s Algorithm can balance the load for any traffic pattern to any connected topology (a topology with at least one path between each terminal pair.) • The delivery of packet is divided into two phases: • Phase 1: A packet is first sent from s to a random node x • Phase 2: Then it is sent from x to its destination d SoC Architecture
Valiant’s Algorithm on Torus • A packet shall be routed from 00 to 12 • The packet is first routed to a random destination (31) • Dimension-Order Routing is used in this example • The packet is then routed to its final destination 12 • Again Dimension-Order Routing is used in this example 6-ary 2-cube (6-by-6 torus) SoC Architecture
Valiant’s Algorithm on Torus • Torus • Channel load under uniform traffic (50% of traffic crosses bisection) T,U = k / 8 • Channel load under worst traffic (100% of traffic crosses bisection) T,W = k / 4 • Average minimum hop count (k even) Hmin, T = nk / 4 6-ary 2-cube SoC Architecture
Valiant’s Algorithm on Torus • Valiant’s Algorithm gives a good worst case performance on k-ary n-cubes (any traffic pattern) • Each packet travels on average Hmin = 2k/4 for each phase, resulting in HTotal = k • Bisection load is γ= k/4 (twice the load of random traffic T,U) • Throughput is Θ = 4b/k (half throughput of random traffic) 6-ary 2-cube SoC Architecture
Valiant’s Algorithm on Torus • Good performance on worst-case traffic patterns comes at expense of • Locality • Overhead (2 routing phases) • Valiant’s Algorithm performs very bad for nearest neighbor traffic • Hmin is increased from 1 to nk/2 6-ary 2-cube SoC Architecture
Minimal oblivious routing • Minimal oblivious routing attempts to achieve the load balance of randomized routing without giving up the locality • A minimal version of Valiant’s algorithm • This is done by restricting routes to minimal paths • Again routing is done in two steps • Route to random node • Route to destination SoC Architecture
Idea: For each packet randomly determine a node xinside the minimal quadrant, such that the packet is routed from source node s to x and then to destination node d Assumption: At each node routing in x or y direction is allowed. 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
For each node in quadrant (00, 10, 20, 01, 11, 21) Determine a minimal route via x Start with x = 00 Three possible routes: (00, 01, 11, 21) (p=0.33) (00, 10, 20, 21) (p=0.33) (00,10,11,21) (p=0.33) 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
x = 01 One possible route: (00, 01, 11, 21) (p=1) 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
x = 10 Two possible routes: (00, 10, 20, 21) (p=0.5) (00, 10, 11, 21) (p=0.5) 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
x = 11 Two possible routes: (00, 10, 11, 21) (p=0.5) (00, 01, 11, 21) (p=0.5) 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
x = 20 One possible route: (00, 10, 20, 21) (p=1) 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
x = 21 Three possible routes: (00, 01, 11, 21) (p=0.33) (00, 10, 20, 21) (p=0.33) (00, 10, 11, 21) (p=0.33) 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
Adding the probabilities on each channel Example, link (00,01) P=1/3, x = 00 P=1, x = 01 P=0, x = 10 P=1/2, x = 11 P=0, x = 20 P=1/3, x = 21 P(00,01)=(2*1/3+1/2+1)/6 =2.17/6 20 03 13 23 33 02 12 30 01 11 21 31 00 10 22 32 Minimal oblivious routing SoC Architecture
Results: Load is not very balanced Path between node 10 and 11 is less used Load imbalance can be fixed by weighting the choice of different intermediate nodes, resulting in Load-Balanced Oblivious Routing 30 12 03 13 33 02 22 23 32 01 21 20 31 00 10 11 Minimal oblivious routing p = 2.17/6 p = 3.83/6 p = 2.17/6 p = 1.67/6 p = 2.17/6 p = 3.83/6 p = 2.17/6 SoC Architecture
Adaptive Routing • Adaptive routing algorithms use information about the network state, typically queue occupancies, to route packets over the network • In theory, adaptive routing should be better than an oblivious algorithm • However in practice, they do not, since only local information is used for adaptive decisions • Locally load is balanced • Globally load maybe imbalanced SoC Architecture
Adaptive RoutingLocal Information not enough • In each cycle node 5 sends a packet to node 6 • Node 3 sends packet to node 7 SoC Architecture
Adaptive RoutingLocal Information not enough • Node 3 does not know about the traffic between 5 and 6 before the input buffers between node 3 and 5 are completely filled with packets! SoC Architecture
Adaptive RoutingLocal Information is not enough • Adaptive flow works better with smaller buffers, since small buffers fill faster and thus congestion is propagated earlier to the sensing node (stiff backpressure) SoC Architecture
Adaptive Routing • How does the adaptive routing algorithm sense the state of the network? • It can only sense current,local information • Global information is based on historic local information • Global information is also difficult to collect • Changes in the traffic flow in the network are observed much later SoC Architecture
Minimal adaptive routing chooses among the minimal routes from source s to destination d At each hop a routing function generates a productive output vector that identifies which output channels of the current node will move the packet closer to its destination Network state is then used to select one of these channels for the next hop 12 03 13 33 02 20 30 01 11 21 31 10 22 32 Minimal Adaptive Routing 23 00 SoC Architecture
Good at locally balancing load Poor at globally balancing load Minimal adaptive routing algorithms are unable to avoid congestion of source-destination pairs with no minimal path diversity. Minimal Adaptive Routing Local congestion can be avoided Local congestion cannot be avoided SoC Architecture
Local optimum results in global non-optimum. 10 20 03 13 33 12 02 32 01 11 21 31 22 30 Minimal Adaptive Routing 23 How to improve the situation? 00 SoC Architecture
Fully-Adaptive Routing does not restrict packets to take the shortest path Misrouting is allowed This can help to avoid congested areas and improves load balance 20 10 03 13 33 12 02 32 01 11 21 31 22 30 Fully Adaptive Routing 23 00 What problem may fully adaptive routing lead to? SoC Architecture