1 / 35

Responsive Yet Stable Traffic Engineering

Responsive Yet Stable Traffic Engineering . Srikanth Kandula. Dina Katabi, Bruce Davie, and Anna Charny. ISPs needs to map traffic to underlying topology. Good Mapping  Good Performance & Low Cost. Good mapping requires Load Balancing. Egress. Ingress. 100%.

Leo
Download Presentation

Responsive Yet Stable Traffic Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Responsive Yet Stable Traffic Engineering Srikanth Kandula Dina Katabi, Bruce Davie, and Anna Charny

  2. ISPs needs to map traffic to underlying topology • Good Mapping  Good Performance & Low Cost • Good mapping requires Load Balancing Egress Ingress 100%

  3. More, ISPs want to re-balance load when an unexpected event causes congestion Egress Ingress 100%

  4. Traffic Change More, ISPs want to re-balance load when an unexpected event causes congestion  failure, BGP reroute, flash crowd, or attack Egress Ingress 100%

  5. Traffic Change More, ISPs want to re-balance load when an unexpected event causes congestion failure, BGP reroute, flash crowd, or attack Move Traffic Egress Ingress 100%

  6. But, rebalancing load in realtime is risky Traffic Change Ingress2 Ingress1 • Need to rebalance load ASAP • Remove congestion before it affects user’s performance • But, moving quickly  may overshoot  congestion on a different path  more drops … Congestion!

  7. But, rebalancing load in realtime is risky Traffic Change • Need to rebalance load ASAP • Remove congestion before it affects user’s performance • But, moving quickly  may overshoot  congestion on a different path  more drops … Ingress2 Congestion! Ingress1

  8. But, rebalancing load in realtime is risky • Need to rebalance load ASAP • Remove congestion before it affects user’s performance • But, moving quickly  may overshoot  congestion on a different path  more drops … • Problem: How to make Traffic Engineering: • Responsive: reacts ASAP • Stable: converges to balanced load without overshooting or generating new congestion

  9. Long-Term Demands OSPF-TE Link Weights Current Approaches Offline TE (e.g., OSPF-TE) • Avoids the risk of instability caused by realtime adaptation, but also misses the benefits • Balances the load in steady state • Deal with failures and change in demands by computing routes that work under most conditions • Overprovision for unanticipated events Online TE (e.g., MATE) • Try to adapt to unanticipated events • But, can overshoot causing drops and instability

  10. This Talk • TeXCP: Responsive & Stable Online TE • Idea: • Use adaptive load balancing • But add explicit-feedback congestion control to prevent overshoot and drops • TeXCP keeps utilization always within a few percent of optimal • Compare to MATE and OSPF-TE, showing that TeXCP outperforms both

  11. Typical Formalization of the TE Problem Find a routing that: Min Max-Utilization • Removes hot spots and balances load • High Max-Utilization is an indicator that the ISP should upgrade its infrastructure

  12. Online TE involves solving 2 sub-problems • Find the traffic split that minimizes the Max-Utilization • Converge to the balanced traffic splits in a stable manner Also, an implementation mechanism to force traffic to follow the desired splits

  13. Force traffic along the right paths Solution: Egress Ingress TeXCP Agent Implementation: • A TeXCP agent per IE, at ingress node • ISP configures each TeXCP agent with paths between IE • Paths are pinned (e.g., MPLS tunnels)

  14. TeXCP Load Balancer Solution: Distributedly, TeXCP agents find balanced traffic splits Sub-Problem: • Periodically, TeXCP agent probes a path for its utilization U1 = 0.4 U2 = 0.7 Egress Ingress x Probes follow the slow path like ICMP messages

  15. Distributedly, TeXCP agents find balanced traffic splits Sub-Problem: TeXCP Load Balancer Solution: • Periodically, TeXCP agent probes a path for its utilization • A TeXCP agent iteratively moves traffic from over-utilized paths to under-utilized paths • rp is this agent’s traffic on path p • Deal with different path capacity • Deal with inactive paths (rp =0)

  16. Distributedly, TeXCP agents find balanced traffic splits Sub-Problem: TeXCP Load Balancer Solution: • Periodically, TeXCP agent probes a path for its utilization • A TeXCP agent iteratively moves traffic from over-utilized paths to under-utilized paths • rp is this agent’s traffic on path p • Deal with different path capacity • Deal with inactive paths (rp =0) Proof in paper

  17. Converge to balanced load in a stable way Use Experience from Congestion Control (XCP) Solution: Sub-Problem: Congestion Control • Flow from sender to receiver • Senders share the bottleneck; need coordination to prevent oscillations Online TE • Flow from ingress to egress • TeXCP agents share physical link; need coordination to prevent oscillations Move in really small increments  No Overshoot! Challenge is to move traffic quickly w/o overshoot

  18. Ingress-Egress Demands Analogous to an Application Load Balancer Path Controller Path Controller Congestion Management layer Congestion Management Layer between Load Balancer and Data Plane • Set of light-weight per-path congestion controllers Unlike prior online TE, Load Balancer can push a decision to the data plane only as fast as the Congestion Management Layer allows it

  19. Utilization Feedback Ingress Egress Probe Per-Path Light-Weight Congestion Controller • Explicit feedback from core routers (like XCP) • Periodically, collects feedback in ICMP-like probes

  20. U = .8 F = 100kbps Ingress Egress Probe Per-Path Light-Weight Congestion Controller • Explicit feedback from core routers (like XCP) • Periodically, collects feedback in ICMP-like probes

  21. U = .8 F = 100kbps Ingress Egress Probe Per-Path Light-Weight Congestion Controller • Explicit feedback from core routers (like XCP) • Periodically, collects feedback in ICMP-like probes U = .2 F = 500kbps

  22. Ingress Egress Per-Path Light-Weight Congestion Controller • Explicit feedback from core routers (like XCP) • Periodically, collects feedback in ICMP-like probes • Core router computes aggregate feedback Δ = Spare BW – Queue / Max-RTT • Estimates number of IE-flows by counting probes, and divides feedback between them Occasional explicit feedback in probes… Need software changes only

  23. Stability Idea Ingress Path Controller Theorem 1: Given a particular load split, the path controller stabilizes the traffic on each link Theorem 2: Given stable path controllers, • Every TeXCP agent sees balanced load on all paths • Unused paths have higher utilization than used paths Load Balancer Egress Path Controller Per-path controller works at a faster timescale than load balancer  Can decouple components  Stabilizeseparately Informally stated:

  24. Performance

  25. Simulation Setup Standard for TE • Rocketfuel topologies • Average demands follow gravity model • IE-traffic consists of large # of Pareto on-off sources TeXCP Parameters: • Each agent is configured with 10 shortest paths • Probe for explicit feedback every 0.1s • Load balancer re-computes a split every 0.5s Compare to Optimal Max-Utilization • Obtained with a centralized oracle that has Immediate and exact demands info, and uses as many paths as necessary

  26. TeXCP Balances Load Without Oscillations Maximum Link Utilization Time (s) TeXCP converges to a few percent of optimal

  27. TeXCP Balances Load Without Oscillations Link Utilization Time (s) Utilizations of all links in the network change without oscillations

  28. Egress 1 Ingress 1 L1 L2 Egress 2 Ingress 2 L3 Egress 3 Ingress 3 Comparison with MATE • MATE is the state-of-the-art in online TE • All simulation parameters are from the MATE paper

  29. decrease in cross traffic decrease in cross traffic TeXCP balances load better than MATE MATE TeXCP Offered Link Load Time (s) Time (s) Avg. drop rate in MATE is 20% during convergence Explicit feedback allows TeXCP to react faster and without oscillations

  30. Comparison with OSPF-TE • OSPF-TE is the most-studied offline TE scheme • It computes link weights, which when used in OSPF balance the load • OSPF-TE-FAIL is an extension that optimizes for failures • OSPF-TE-Multi-TM is an extension that optimizes for variations in traffic demands

  31. Comparison with OSPF-TE under Static Load Optimal Ratio of Max-U to Opt. 1.6 1.4 1.2 1 Abovenet Genuity Sprint Tiscali AT&T

  32. Comparison with OSPF-TE under Static Load OSPF-TE TeXCP Ratio of Max-U to Opt. 1.6 1.4 1.2 1 Abovenet Genuity Sprint Tiscali AT&T TeXCP is within a few percent of optimal, outperforming OSPF-TE

  33. Comparison with OSPF-TE-Fail 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 OSPF-TE-Fail TeXCP Ratio of Max-U to Opt. Abovenet Genuity Sprint Tiscali AT&T TeXCP allows an ISP to support same failure resilience with about ½ the capacity !

  34. Performance When Traffic Deviates From Long-term Averages OSPF-TE-Multi-TM TeXCP 1.8 1.6 1.4 1.2 1 Ratio of Max-U to Opt. 1 1.5 2 2.5 3 3.5 4 4.5 5 Deviation from Long-term Average Demands TeXCP reacts better to realtime demands!

  35. Conclusion • TeXCP: Responsive & Stable Online TE • Combines load balancing with a Cong. Mngt. Layer to prevent overshoot and drops • TeXCP keeps utilization always within a few percent of optimal • Compared to MATE, it is faster and does not overshoot • Compared to OSPF-TE • it keeps utilization 20% to 100% lower • it supports the same failure resilience with ½ the capacity  major savings for the ISP http://nms.lcs.mit.edu/projects/texcp/

More Related