280 likes | 473 Views
Indirect Adaptive Routing on Large Scale Interconnection Networks. Nan Jiang, William J. Dally Computer System Laboratory Stanford University. John Kim Korean Advanced Institute of Science and Technology. Overview. Indirect adaptive routing (IAR)
E N D
Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim Korean Advanced Institute of Science and Technology
Overview Indirect adaptive routing (IAR) Allow adaptive routing decision to be based on local and remote congestion information Main contributions Three new IAR algorithms for large scale networks Steady state and transient performance evaluations Impact of network configurations Cost of implementation
Presentation Outline Background The dragonfly network Adaptive routing Indirect adaptive routing algorithms Performance results Implementation considerations
Global Network … Group 0 Group 1 Group 2 p 0 … … … Router 0 Router 1 Router 2 p 1 … Local Network The Dragonfly Network • High Radix Network • High radix routers • Small network diameter • Each router • Three types of channels • Directly connected to a few other groups • Each group • Organized by a local network • Large number of global channels (GC) • Large network with a global diameter of one
Congestion Routing on the Dragonfly • Minimal Routing (MIN) • Source local network • Global network • Destination local network • Some Adversarial traffic congests the global channels • Each group i sends all packets to group i+1 • Oblivious solution: Valiant’s Algorithm (VAL) • Poor performance on benign traffic … Group 0 Group 1 Group 2 p 0 … … … Router 0 Router 1 Router 2 p 1 …
q 0 q 1 Congestion Adaptive Routing • Choose between the MIN path and a VAL path at the packet source [Singh'05] • Decision metric: path delay • Delay: product of path distance and path queue depth • Measuring path queue length is unrealistic • Use local queues length to approximate path • Require stiff backpressure MIN VAL GC GC q 2 q 3 Source Router
Adaptive Routing: Worst Case Traffic 450 400 350 300 Packet Latency (Simulation cycles) 250 200 Valiant’s 150 Minimal Adaptive 100 0 0.1 0.2 0.3 0.4 0.5 Throughput (Flit Injection Rate)
Indirect Adaptive Routing Improve routing decision through remote congestion information Previous method: Credit round trip [Kim et. al ISCA’08] Three new methods: Reservation Piggyback Progressive
Congestion Credits Delayed Credits Credit Round Trip (CRT) • Delay the return of local credits to the congested router • Creates the illusion of stiffer backpressure • Drawbacks • Remote congestion is still inferred through local queues • Information not up to date MIN VAL GC GC Source Router • [Kim et. al ISCA’08]
Reservation (RES) Each global channel track the number of incoming MIN packets Injected packets creates a reservation flit Routing decision based on the reservation outcome Drawbacks Reservation flit flooding Reservation delay RES Failed RES Flit MIN VAL GC GC Congestion Source Router
Piggyback (PB) Local congestion broadcast Piggybacking on each packet Send on idle channels Congestion data compression Drawbacks Consumes extra bandwidth Congestion information not up to date (broadcast delay) GC GC Free Busy MIN VAL GC GC Congestion Source Router
Progressive (PAR) • MIN routing decisions at the source are not final • VAL decisions are final • Switch to VAL when encountering congestion • Draw backs • Need an additional virtual channel to avoid deadlock • Add extra hops MIN VAL GC GC Congestion Source Router
Experimental Setup Fully connected local and global networks 33 groups 1,056 nodes 10 cycle local channel latency 100 cycle global channel latency 10-flit packets
Steady State Traffic: Uniform Random 300 Piggyback 280 Credit Round Trip Progressive 260 Reservation Minimal 240 220 Packet Latency (Simulation cycles) 200 180 160 140 120 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Throughput (Flit Injection Rate)
Steady State Traffic: Worst Case 450 Piggyback Credit Round Trip 400 Progressive Reservation Valiant’s 350 300 Packet Latency (Simulation cycles) 250 200 150 100 0 0.1 0.2 0.3 0.4 0.5 Throughput (Flit Injection Rate)
Transient Traffic: Uniform Random to Worst Case Progressive Progressive Piggyback Piggyback Average Packet Latency per Cycle - UR to WC 500 400 Packet Latency 300 200 100 0 20 40 60 80 100 Cycles After Transition % Packets Routing Non-minimally per Cycle - UR to WC 100 % of Packets Routing Nonminimally 50 0 0 20 40 60 80 100 Cycles After Transition
Network Configuration Considerations Packet size RES requires long packets to amortize reservation flit cost Routing decision is done on per packet basis Channel latency Affects information delay (CRT, PB) Affects packet delay (PAR, RES) Network size Affects information bandwidth overhead (RES, PB) Global diameter greater than one Need to exchange congestion information on the global network
Cost Considerations Credit round trip Credit delay tracker for every local channel Reservation Reservation counter for every global channel Additional buffering at the injection port to store packets waiting for reservation Piggyback Global channel lookup table for every router Increase in packet size Progressive Extra virtual channel for deadlock avoidance
Conclusion • Three new indirect adaptive routing algorithms for large scale networks • Performance and design evaluation of the algorithms • Best Algorithm? • Piggyback performed the best under steady state traffic • Progressive responded fastest to transient changes • Network configurations will affect some algorithm performance • Cost of implementation
Thank You! Questions?
300 VAL 280 MIN Adaptive 260 240 220 200 Packet Latency - Simulation cycles 180 160 140 120 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Throughput - Flit Injection Rate Adaptive Routing: Uniform Traffic
1000 Random Permutation Traffic PB CRT PAR 30 30 30 25 25 25 20 20 20 % of 1K Permutations % of 1K Permutations % of 1K Permutations 15 15 15 10 10 10 5 5 5 0 0 0 200 300 200 300 200 300 Packet Latency Packet Latency Packet Latency RES VAL 30 30 25 25 20 20 % of 1K Permutations % of 1K Permutations 15 15 10 10 5 5 0 0 200 300 200 300 Packet Latency Packet Latency
Effect of Packet size on RES: Worst Case Traffic 550 500 450 400 350 300 Latency - Simulation cycles 250 200 150 1 Flit 100 2 Flits 4 Flits 50 8 Flits 0 0 0.1 0.2 0.3 0.4 0.5 Throughput - Flit Injection Rate
Large local network: Uniform Random 400 350 300 250 200 Packet Latency - Simulation cycles 150 PB 100 CRT MIN 50 PAR RES 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Throughput - Flit Injection Rate
Large local network: Worst Case 600 500 400 Packet Latency - Simulation cycles 300 200 PB CRT 100 PAR RES VAL 0 0 0.1 0.2 0.3 0.4 0.5 Throughput - Flit Injection Rate