310 likes | 798 Views
Routing Algorithms. ECE 284 On-Chip Interconnection Networks Spring 2014. Routing. Will assume 2D mesh in this talk How flits are routed from source to destination can greatly impact network congestion Two types of routing:
E N D
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014
Routing • Will assume 2D mesh in this talk • How flits are routed from sourceto destination can greatly impactnetwork congestion • Two types of routing: • Oblivious routing: routing does notconsider or depend on the currenttraffic condition • Adaptive routing: takes intoconsideration current traffic condition to determine the routing path (tries to get around congested areas) • Oblivious routing simpler (less expensive) to implement • This talk will review existing oblivious routing algorithms
Routing Algorithm Objectives • Maximize throughput • How much load the network can handle • Minimize latency • Minimize routing delay between source and destination
Dimension-Ordered Routing (DOR)(also called XY routing) D S either minimal XYorYX routing to the destination (here it uses XY route with probability 1.0)
DOR (XY) Routing with Uniform Traffic • For an N = K x K mesh, N/2 nodes are in the top half. • 1/2 of its traffic will cross the bisection. • Traffic crossing bisection uniformly distributed across K channels. • Therefore, maximum channel load for DOR with uniform traffic is: ϒ(DOR, uniform) = [ (N/2) * (1/2) ] / K = K/4
Problem with DOR (XY) Routing • Minimal hop count • But, in the worst-case, the links can get overly congested. e.g., transpose traffic pattern. (0, 0) -> (3, 3) (1, 0) -> (3, 2) (2, 0) -> (3, 1) (0, 0) -> (3, 3) (1, 0) -> (3, 2) (2, 0) -> (3, 1) ϒwc(DOR) = K – 1 >> ϒ(DOR, uniform) in the worst-case.
Valiant Load-Balancing (VAL) [1981] D S randomly chosen intermediate node minimal XY routing to any intermediate node, then minimal XY routing to destination node
Valiant Load-Balancing (VAL) • Works by turning any traffic pattern into2 phases of uniform traffic patterns, even adversarial or worst-case traffic patterns. • In effect, it evenly load-balances the traffic. • Worst-case channel loadϒwc(VAL) = 2 * ϒ(DOR, uniform) = 2 * (K/4) = K/2which is 1/2 network capacity relative to DOR and uniform traffic.
Valiant Load-Balancing (VAL) • Effective network capacity normalized throughput is 1/2 capacity. • However, average hop count is 2X DOR. • 1/2 capacity was thought to be the optimal worst-case throughput for any routing algorithm.
ROMM [1995] D intermediate node randomly chosen only in the minimal direction to destination S minimal XY routing to an intermediate node only in the minimal direction, then minimal XY routing to the final destination node
ROMM • Tries to load-balance traffic by randomly distributing traffic along all possible minimal paths. • Good that minimal number of hops is guaranteed. • But, turns out in the worst-case, ROMM performs about as bad as DOR.
O1TURN [2005] D S use both minimal XYandYX routing to the destination (0.5 XY + 0.5 YX)
O1TURN • Even though it only considers XY or YX path, not all possible paths in VAL or all possible minimal paths in ROMM, it is guaranteed to achieve 1/2 capacity for the even radix case, which has been shown to be optimal. • For the odd radix case, O1TURN is very near the optimal 1/2 capacity. • Unlike VAL, O1TURN only uses minimal routing paths, thus no penalty in hop count.
Comparison Even Radix : Opt * 1 Odd Radix : Opt * (1 - 1 / K2) 0.5 0.4 0.3 0.2 VAL DOR 0.1 ROMM O1TURN [adapted from Seo et al., O1TURN talk, ISCA 2005]
Simulation Results • 4 x 4 2D MESH – Uniform Random Traffic Pattern [adapted from Seo et al., O1TURN talk, ISCA 2005]
Simulation Results • 4 x 4 2D MESH – Matrix Transpose Traffic Pattern • One of the worst-case traffic pattern for DOR [adapted from Seo et al., O1TURN talk, ISCA 2005]
Simulation Results • 4 x 4 2D MESH – Bit Complement Traffic Pattern • Already balanced traffic pattern [adapted from Seo et al., O1TURN talk, ISCA 2005]
Simulation Results • 4 x 4 2D MESH – HOT SPOT Traffic Pattern • 2 nodes have 20% of traffic [adapted from Seo et al., O1TURN talk, ISCA 2005]
Simulation Results • Delay penalty of adaptive routing • How the complexity of router implementation affects on latency • Hot Spot Traffic Pattern [adapted from Seo et al., O1TURN talk, ISCA 2005]
U2TURN [2012] • 1/2 capacity has been thought to be optimal worst-case throughput for both odd and even radices, and O1TURN is the state-of-the-art for achieving this for the even radix case. • But, turns out 1/2 capacity is not optimal for the odd radix case.
U2TURN • U2TURN considers all possible XYX and YXY2-TURN paths, and selects these paths with equal probability. • XYX paths: randomly select a node on the same row and route to it, followed by minimal YX routing to final destination. • YXYpaths: randomly select a node on the same column and route to it, followed by minimal XYrouting to final destination.
Analytical Results • For the even radix case, worst-case capacity of U2TURN = 1/2, same as VAL and O1TURN, which is optimal. • But, for the odd radix case, worst-case capacity of U2TURN = (K+1)/(2K+1) > 1/2which is better than any existing routing algorithm.
Worst-Case Throughput ROMM VAL DOR U2TURN O1TURN Optimal routing
Latency • A potential concern about U2TURN is that it uses non-minimal routing paths. • However, U2TURN does a better job of load-balancing traffic than O1TURN for difficult traffic patterns. • Hence, the queuing delay can be very high for O1TURN for difficult traffic patterns, hence longer latency despite fewer number of hops. • Surprisingly, latency better for both odd and even radix cases for difficult traffic patterns.
References • Valiant [L.G.Valiant et. al, ACM 1981] • ROMM [T.Nesson et. al, ACM 1995] • O1TURN [D. Seo et. al, ISCA 2005] • U2TURN [G. Sun et. Al, ICCD 2012]