240 likes | 384 Views
DARD : D istributed A daptive R outing for D atacenter Networks. Xin Wu, Xiaowei Yang. Multiple equal cost paths in DCN. core. A gg. ToR. s rc. d st. pod. Scale-out topology -> Horizontal expansion -> More paths. Suboptimal scheduling -> hot spot. s rc 1. dst 1. dst 2. s rc 2.
E N D
DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang
Multiple equal cost paths in DCN core Agg ToR src dst pod • Scale-out topology -> Horizontal expansion -> More paths
Suboptimal scheduling -> hot spot src1 dst1 dst2 src2 • Unavoidable intra-datacenter traffic • Common services: DNS, search, storage • Auto-scaling: dynamic application instances
To prevent hot spots • Distributed • ECMP & VL2: flow-level hashing in switches • Centralized • Hedera: compute optimal scheduling in ONE server Design Space Distributed: RobustbutNot Efficient Centralized: Efficient but Not Robust
Goal: practical, efficient, robust • Practical • Using well-proven technologies • Efficient • Close to optimal traffic scheduling • Robust • No single point failure Design Space Centralized: Efficient but Not Robust Distributed: RobustbutNot Efficient Distributed: RobustandEfficient
Contributions • Explore the possibility of distributedyet close-to-optimalflow scheduling in DCNs. • A working implementation in testbed. • Proven convergence upper bound.
Intuition: minimize the maximum number of flows via a link dst1 src2 src3 src1 dst2 dst3 Step 0: maximum # of flows via a link = 3
Intuition: minimize the maximum number of flows via a link dst1 src2 src3 src1 dst2 dst3 Step 1: maximum # of flows via a link = 2
Intuition: minimize the maximum number of flows via a link dst1 src2 src3 src1 dst2 dst3 Step 2: maximum # of flows via a link = 1
Architecture • Control loop runs on every server independently Monitor network states Compute next scheduling Change flow’s path
Monitor network states • srcasks switches for the #_of_flowsand bandwidthof each link to dst. dst src • srcassemblies the link states to identify the most and least congested paths to dst.
Distributed computation • Runs on every server 1.for each dst 2.{ 3.Pbusy: the most congested path from src to dst; 4.Pfree: the least congested path from src to dst; 5.if (moving one flow from pbusyto pfree won’t cause a more congested path than pbusy) 6. Move one flow from pbusyto pfree; 7. } • Steps to convergence is bounded
Change path: using different src-dst pair core3 core2 core4 core1 3.0.0.0/8 4.0.0.0/8 2.0.0.0/8 1.0.0.0/8 agg1 1.1.0.0/16 2.1.0.0/16 agg2 agg1 tor2 tor1 1.1.1.0/24 2.1.1.0/24 3.1.1.0/24 4.1.1.0/24 tor1 src dst agg1’s down-hill table dst next hop 1.1.1.0/24 tor1 1.1.2.0/24 tor2 2.1.1.0/24 tor1 2.1.2.0/24 tor2 agg1’s up-hill table src next hop 1.0.0.0/8 core1 2.0.0.0/8 core2 src 1.1.1.2 2.1.1.2 3.1.1.2 4.1.1.2 dst 1.2.1.2 2.2.1.2 3.2.1.2 4.2.1.2 • src-dst address pair uniquely encodes a path • Static forwarding table
Forwarding example: E2->E1 2.0.0.0/8 1.0.0.0/8 core1 agg2 agg1 tor2 tor1 1.1.1.2 1.2.1.2 E1 E2 agg1’s down-hill table dst next hop 1.1.1.0/24 tor1 1.1.2.0/24 tor2 2.1.1.0/24 tor1 2.1.2.0/24 tor2 agg1’s up-hill table src next hop 1.0.0.0/8 core1 2.0.0.0/8 core2 Packet header: src: 1.2.1.2, dst: 1.1.1.2
Forwarding example: E1->E2 2.0.0.0/8 1.0.0.0/8 core1 agg2 agg1 tor2 tor1 1.1.1.2 1.2.1.2 E1 E2 agg1’s down-hill table dst next hop 1.1.1.0/24 tor1 1.1.2.0/24 tor2 2.1.1.0/24 tor1 2.1.2.0/24 tor2 agg1’s up-hill table src next hop 1.0.0.0/8 core1 2.0.0.0/8 core2 Packet header: src: 1.1.1.2, dst: 1.2.1.2
Randomness: prevent path oscillation • Add a random time interval to the control cycle
Implementation • DeterLabtestbed • 16-end-hosts fattree • Monitoring: OpenFlow API • Computation: daemon on end hosts • One NIC multiple addresses: IP alias • Static routes: OpenFlow forwarding table • Multipath: IP-in-IP encapsulation • ns-2 simulator • For different & larger topologies
DARD fully utilizes the bisection bandwidth • Simulation, 1024-end-host fattree • pVLB: periodical flow-level VLB Bisection bandwidth (Gbps) Traffic Patterns
DARD improves large file transfer time • Testbed, 16-end-host fattree Inter-pod dominant Intra-pod dominant random DARD vs. ECMP improvement # of new files per second
DARD converges in 2~3 control cycles • Simulation, 1024-end-host fattree, static traffic patterns • One control cycle ≈ 10 seconds Inter-pod dominant Intra-pod dominant random Convergence time (seconds)
Randomness prevents path oscillation • Simulation, 128-end-host fattree Intro-pod dominant random Inter-pod dominant Times a flow switches its paths
DARD’s control overhead is bounded by the topology • control_traffic= #_of_serversx#_of_switches. • Simulation, 128-end-host fattree Control traffic (MB/s) DARD Hedera # of simultaneous flows
Conclusion • DARD: Distributed Adaptive Routing for Datacenters • Practical: well-proven end-host-based technologies • Efficient: close to optimal traffic scheduling • Robust: no single point failure Monitor network states Compute next scheduling Change flow’s path
Thank You! Questions and comments: xinwu@cs.duke.edu