1 / 37

Data Center Routing – Traffic Engineering

Data Center Routing – Traffic Engineering. Yao Lu Rui Zhang. ECE 260C VLSI Advanced Topics. Outline. What is routing/traditional routing algorithm What is data center Difference between data center and the Internet Some Recent work in data center TE Open questions/proposals.

pascha
Download Presentation

Data Center Routing – Traffic Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Center Routing – Traffic Engineering Yao Lu Rui Zhang ECE 260C VLSI Advanced Topics

  2. Outline • What is routing/traditional routing algorithm • What is data center • Difference between data center and the Internet • Some Recent work in data center TE • Open questions/proposals

  3. What is routing

  4. Traditional routing algorithm • RIP (Routing Information Protocol) • IGRP (Interior Gateway Routing Protocol) • EIGRP (Enhanced Interior Gateway Routing Protocol) • OSPF (Open Shortest Path First) • IS-IS (Intermediate System-to-Intermediate System) • BGP (Border Gateway Protocol)

  5. What is data center • Nowadays, 40% of the total Internet traffic goes to Google[1]

  6. Difference between data center and the Internet • Design Goal • latency, reliability, throughput, energy, etc. • Properties • Well-structured topology • Movability of the locations of sources and destinations • Global knowledge of the whole data center network

  7. Recent work • Equal-Cost Multi-Path (ECMP)[7] • Valiant Load Balancing (VLB)[6] • CamCube[5] • Hedera[8] • Joint VM Placement and Routing (JVMPR)[4]

  8. ECMP • Many equal cost paths going up to the core switches • Only one path down from each core switch • Randomly allocate paths to flows using hash of the flow S D

  9. VLB • Goal • Guarantee equal-spread load-balancing in a mesh network • Method • Bouncing individual packets from a source switch in the mesh off of randomly chosen intermediate “core” switches, which finally forward those packets to their destination switch.

  10. Camcube • 3D Torus Topology • Offer Camcube API • To let service/application to design its own routing protocal • Core services • Basic routing algorithm • link state-based protocol

  11. Estimate Flow Demands Detect Large Flows Place Flows Hedera • Detect Large Flows • Flows that need bandwidth but are network-limited • Estimate Flow Demands • Use min-max fairness to allocate flows between src-dst pairs • Place Flows • Use estimated demands to heuristically find better placement of large flows on the ECMP paths

  12. Hedera • Large Flow Detection • Scheduler continually polls edge switches for flow byte-counts • Flows exceeding B/s threshold are “large” • > %10 of hosts’ link capacity (i.e. > 100Mbps)

  13. Hedera • Demand Estimation • Goal • Estimate available bandwidth to allocate • Method • Using min-max fairness, given traffic matrix of large flows, modify each flow’s size at it source and destination iteratively… • Sender equally distributes bandwidth among outgoing flows that are not receiver-limited • Network-limited receivers decrease exceeded capacity equally between incoming flows • Repeat until all flows converge

  14. Hedera A X B Y C Senders

  15. Hedera A X B Y C Receivers

  16. Hedera A X B Y C Senders

  17. Hedera A X B Y C Receivers

  18. Hedera • Flow Placement • Goal • Find a good allocation of paths for the set of large flows, such that the average bisection bandwidth of the flows is maximized • Method • Global First Fit: • Greedily choose path that has sufficient unreserved b/w • Simulated Annealing: • Iteratively find a globally better mapping of paths to flows

  19. Hedera • Global First Hit • New flow detected, linearly search all possible paths from SD • Place flow on first path whose component links can fit that flow

  20. Hedera • Simulated Annealing • 4 specifications • State space • Neighboring states • Energy • Temperature • Simple example: Minimizing f(x) F(x)

  21. Hedera • State: All possible mapping of flows to paths • Constrained to reduce state space size • Flows to a destination constrained to use same core • Neighbor State: Swap paths between 2 hosts • Within same pod • Function/Energy: Total exceeded b/w capacity • Using the estimated demand of flows • Minimize the exceeded capacity • Temperature: Iterations left • Fixed number of iterations (1000s)

  22. Hedera

  23. JVMPR • Joint VM Placement and Routing • Goal: Efficient traffic engineering under dynamic arrivals and departures of jobs • One method:Localizing traffic by flexible VM placement node utilization • Another method:Avoiding congestion by intelligent routing link utilization coupled with each other

  24. JVMPR existing VM VM we need to add • Figure1:The left structure is the existing VMs and traffic • The middle structure is good VM placement with high congestion • The right structure is a worse placement with lower congestion

  25. JVMPR • JVMPR consider placement and routing at the same time • It develops an approximation algorithm that leverages the specific structure of the joint design problem

  26. JVMPR • Placement and Route Selection • Placement: The feasible decision space for VM placement is • Routing:The feasible decision space for routing is

  27. JVMPR • Optimize Resource Utilization • costnet: Network cost • Measure the congestion • costnode: Node cost • Operating cost induced by a swith or a machine • Goal: Minimize the total cost

  28. JVMPR • Any problem? • Yes! • The number of jobs is not fixed • Jobs enter or depart the system dynamically • Better way: Online solution • Static problem setting to a dynamic environment • Key idea: Perform local re-optimization

  29. JVMPR • Online solution algorithm • Upon a new job arrival, assign the new job to one configuration accoridng to the transition probability • Upon a job departure, pick one job and migrate it to new machines according to the transition probability

  30. JVMPR • Why dynamic JVMPR solution is appealing? • We do not require VM migrations when new jobs arrive and at most one job migration when jobs depart • The computation of migration probability only requires local information

  31. JVMPR Max Core Switch Utilization Percentage of elephant flows Fig. Performance comparison

  32. JVMPR • What is the price we pay for it? • The approximated Markov chain no longer converges to the exact stationary distribution • But to a neighborhood around it • Need a lot computation

  33. Summary

  34. Summary

  35. Open questions/proposals • Imperfection of current algorithms • Hedera • Large flow detection too simple • Demand estimation only considered TCP flows • JVMPR • Demand a lot of computation • It is approximation • Not fully take advantage of the nice features of data center • Combine topology, movability and VM placement together • Add VM placement consideration into Hedera

  36. Reference [1] http://www.forbes.com/sites/timworstall/2013/08/17/fascinating-number-google-is-now-40-of-the-internet/ [2] Moy, John T. OSPF: anatomy of an Internet routing protocol. Addison-Wesley Professional, 1998. [3] Chen, Kai, Chengchen Hu, Xin Zhang, Kai Zheng, Yan Chen, and Athanasios V. Vasilakos. "Survey on routing in data centers: insights and future directions." Network, IEEE 25, no. 4 (2011): 6-10. [4] Jiang, Joe Wenjie, Tian Lan, Sangtae Ha, Minghua Chen, and Mung Chiang. "Joint VM placement and routing for data center traffic engineering." In INFOCOM, 2012 Proceedings IEEE, pp. 2876-2880. IEEE, 2012. [5] Abu-Libdeh, Hussam, Paolo Costa, Antony Rowstron, Greg O'Shea, and Austin Donnelly. "Symbiotic routing in future data centers." ACM SIGCOMM Computer Communication Review 41, no. 4 (2011): 51-62. [6] Farrington, Nathan, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. "Helios: a hybrid electrical/optical switch architecture for modular data centers." ACM SIGCOMM Computer Communication Review 41, no. 4 (2011): 339-350. [7] Hopps, Christian E. "Analysis of an equal-cost multi-path algorithm." (2000). [8] Al-Fares, Mohammad, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. "Hedera: Dynamic Flow Scheduling for Data Center Networks." In NSDI, vol. 10, pp. 19-19. 2010.

  37. Thank you!

More Related