Destination-Based Adaptive Routing for 2D Mesh Networks

Destination-Based Adaptive Routing for 2D Mesh Networks Authors: RohitRamanujam (UCSD) Bill Lin (UCSD) Presenter: Ketan Supanekar (ECE284 Course)

Classes of Routing Algorithms • Oblivious routing • Simple and fast router designs • Poor load balancing under bursty traffic • Adaptive routing • Better performance (throughput, latency) • Better fault tolerance • Higher router complexity

Outline • Introduction • Motivation • Destination-Based Adaptive Routing (DAR) • Evaluation

Minimal Adaptive Routing • Model • Adaptive routing along minimal directions D S

Granularity of Congestion Estimation Coarse Fine Local congestion

Local Congestion • Local adaptive • Measure local congestion metric (free VC, free buffers) D Low congestion Moderate congestion High congestion Optimal Local adaptive S

Granularity of Congestion Estimation Coarse Fine Local congestion Dimension-based congestion

Dimension-based Congestion • RCA-1D (Gratz et al. HPCA’ 08) • Exponential moving average of congestion to all nodes along a dimension D Low congestion Moderate congestion High congestion Optimal RCA-1D S

Granularity of Congestion Estimation Coarse Fine Quadrant-based congestion Local congestion Dimension-based congestion

Quadrant-based Congestion • RCA-Quadrant (Gratz et al. HPCA’ 08) • Exponential moving average of congestion to all nodes in the destination quadrant D Low congestion Moderate congestion High congestion S

Quadrant-based Congestion • RCA-Quadrant (Gratz et al. HPCA’ 08) • Exponential moving average of congestion to all nodes in the destination quadrant D Low congestion Moderate congestion High congestion Optimal RCA-quad S

Granularity of Congestion Estimation Coarse Fine Quadrant-based congestion Local congestion Dimension-based congestion Destination-based congestion

Ideally … • On a per-destination basis: • Estimate end-to-end delay along all minimal paths to destination • Choose path with least delay D Low congestion Moderate congestion High congestion Optimal S

Challenges to Ideal Routing • Limited bandwidth for congestion updates • Congestion notification not instantaneous • Limited storage in on-chip routers • Exponential number of paths to each destination • Limited hardware resources for computations How can we practically emulate ideal adaptive routing?

Destination-based adaptive routing (DAR) • A node estimates delay to all other nodes through candidate outputs every T cycles D L[N][D] = 20 L[E][D] = 30 S

DAR-High Level • Traffic distribution to output ports controlled using per-destination split ratios W Start with initial set of split ratios D Estimate delay to destination through candidate outputs L[N][D] = 20 Shift traffic from more congested port to less congested port W[N][D]= 0.6 L[E][D] = 30 S W[E][D]= 0.4

DAR-High Level • Traffic distribution to output ports controlled using per-destination split ratios W Start with initial set of split ratios D Estimate delay to destination through candidate outputs L[N][D] = 20 Shift traffic from more congested port to less congested port W[N][D]= 0.8 L[E][D] = 30 S W[E][D]= 0.2

Outline • Introduction • Motivation • Destination-Based Adaptive Routing (DAR) • Distributed delay measurement • Split ratio adaptation • Scaling • Evaluation

Distributed Delay Measurement • A node maintains: • Per-destination traffic split ratio through candidate output ports: W[p][j] • Delay to next-hop router/ejection interface through each output port (N, S, E, W, Ej): l[p] • Other Notations :

Distributed Delay Measurement • Every node estimates average delay to all other nodes in the network 12 13 14 15 • Delay from 10 to itself, Avg10[10] = l10[Ej] Avg10[10] • Avg10[10] propagated to neighbors Avg10[10] Avg10[10] 8 9 10 11 • Nodes 6, 9, 14, 11 add local delay to Avg10[10] to compute delay to node 10 Avg10[10] 4 5 6 7 • For example, at node 9, L[E][10] = l[E] + Avg10[10] Avg9[10] = L[E][10] 0 1 2 3

Distributed Delay Measurement • Every node estimates delay to all other nodes in the network • Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors Avg14[10] Avg14[10] • For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg6[10] A[N][10] = Avg9[10] 12 13 14 15 Avg11[10] Avg9[10] Avg9[10] 8 9 10 11 • Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] + l[E] L[N][10] = A[N][10] + l[N] Avg9[10] Avg11[10] Avg6[10] Avg6[10] 4 5 6 7 Avg6[10] • Finally, average delay from node 5 to node 10 is computed as: Avg5[10] = W[E][10]L[E][10] + W[N][10]L[N][10] 0 1 2 3

Distributed Delay Measurement • Every node estimates delay to all other nodes in the network • Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors 12 13 14 15 • For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg6[10] A[N][10] = Avg9[10] 8 9 10 11 • Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] + l[E] L[N][10] = A[N][10] + l[N] 4 5 6 7 • Finally, average delay from node 5 to node 10 is computed as: Avg5[10] = W[E][10]L[E][10] + W[N][10]L[N][10] 0 1 2 3

Adaptation of Split ratio • Objective: Equalize delay on candidate output ports • If only one candidate output, split ratio is 1 • If two candidate outputs, • Let ph be the port with higher delay to destination j • Let plbe the port with lower delay to destination j • W[ph][j] + W[pl][j] = 1 • Δ traffic shifted from phto plevery T cycles • Δproportional to (L[ph][j]-L[pl][j])/L[ph][j]

Granularity of Congestion Estimation Coarse Fine Quadrant-based congestion Local congestion Dimension-based congestion Destination-based congestion Does not scale !!

Granularity of Congestion Estimation Coarse Fine Scalable Destination-based congestion Quadrant-based congestion Local congestion Dimension-based congestion Destination-based congestion

Look-ahead Window • Node S maintains delay estimate for MxM window centered at S. • Any node outside window mapped to closest node within window • A packet’s look-ahead window shifts as it is routed from source to destination 75 75 21 21 21 21 25 25 71 71 75 75 78 18 18 18 18 28 28 68 68 78 78 75 75 21 21 21 21 25 25 71 71 75 75 78 78 18 18 18 18 28 28 68 68 78 78 B P(B) 81 81 15 15 JSDOJSDSDSDSDSDS 15 31 81 81 S 84 84 12 12 12 12 34 34 62 S 84 84 87 87 9 9 9 9 37 37 59 59 87 87 90 90 6 6 6 6 40 40 56 56 90 93 pC P(C) P(A) 93 93 3 3 3 3 PA 43 53 53 93 96 96 0 0 0 0 46 46 50 50 96 96 93 93 3 3 AC 3 paapA 43 43 53 53 CC 93 96 96 0 0 0 0 46 46 50 50 96 96

Window Size • Destination D guaranteed to be within window when packet is (M-1)/2hops away from D. • Intuition: Packet has (M-1)/2 hops to route around congestion hot spots • 7x7 look-ahead window in 16x16 mesh has comparable performance to DAR (equivalent to 31x31 look-ahead window)

Outline • Introduction • Related work • Destination-Based Adaptive Routing (DAR) • Evaluation

Experimental setup • Compare DAR with RCA-1D, RCA-quadrant, Local adaptive • SPLASH-2 benchmarks + synthetic traffic patterns (uniform, transpose, shuffle) • Cycle-accurate NoC simulator models 3-stage router pipeline • 8 VC, 5 flit deep • 1 VC used as escape VC for deadlock prevention

Splash results – 7x7 mesh 41%

Splash results – 7x7 mesh 65%

Improvements Contended Traces : (fft, waters, waterns and lu) • DAR outperforms the best RCA algorithm by a maximum of 41% on the waters trace and by 21% on average over four traces. • DAR outperforms local adaptive routing by maximum of 65% on the waters and by 30% on average over the four traces. • DAR also outperforms O1TURN by up to 94% maximum and by 32% on average over all eight benchmarks.

Uniform traffic – 8x8 mesh

Transpose traffic – 8x8 mesh

Shuffle traffic – 8x8 mesh

SDAR - 16x16 mesh, 7x7 window Average latency over 100 permutation traffic patterns at 18% injection load Network saturation statistics at 18% injection load

Summary • Destination-based Adaptive Routing (DAR) for 2D mesh networks. • Scalable DAR (SDAR) uses look-ahead window and easily scales to large networks. • DAR outperforms existing adaptive (RCA) and oblivious routing. • SDAR achieves comparable performance with significantly less overheads.

Key implementation details • Simple router implementation for moderate sized networks: low storage, low bandwidth • Synchronize delay updates to reuse delay computation and weight adaptation hardware. • Approximate computations to simplify implementation

Router Architecture

Thank you!!

Destination-Based Adaptive Routing for 2D Mesh Networks

Destination-Based Adaptive Routing for 2D Mesh Networks

Presentation Transcript

Adaptive Routing

Adaptive Routing

Adaptive Routing

SOAR: Simple Opportunistic Adaptive Routing Protocol for Wireless Mesh Networks

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010

Orthogonal Rendezvous Routing Protocol for Wireless Mesh Networks

Potential-Based Entropy Adaptive Routing for Disruption Tolerant Networks

Analysis of Routing Metrics for Wireless Mesh Networks

Routing Metrics for Wireless Mesh Networks

Multipath Routing in Wireless Mesh Networks

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010

Routing in Mesh Networks

Routing in Wireless Mesh Networks

Routing in Wireless Mesh Networks

Adaptive backup routing for ad-hoc networks

Cross Layer Adaptive Control for Wireless Mesh Networks

Routing Metrics for Wireless Mesh Networks

Routing Metrics and Protocols for Wireless Mesh Networks

A Position-based Deployment and Routing Approach for Directional Wireless Mesh Networks

Cross Layer Adaptive Control for Wireless Mesh Networks