1 / 18

The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000

The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000. Abha Ahuja InterNap ahuja@umich.edu. Craig Labovitz Microsoft Research labovit@microsoft.com. *In collaboration with Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi. Background.

gudrun
Download Presentation

The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Impact of Policy and Topology on Internet Routing ConvergenceNANOG 20October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu Craig Labovitz Microsoft Research labovit@microsoft.com *In collaboration with Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi

  2. Background In NANOG 19, we showed BGP exhibits poor convergence behavior: • Measured convergence times of up to 20 minutes for BGP path changes/failures • Factorial (N!) theoretic upper bound on BGP convergence complexity (explore all paths of all possible lengths) Open question: In practice, what topological and policy factors impact convergence delay ?

  3. This Talk Goal: Understand BGP convergence behavior under real topologies/policies • Given a physical topology and ISP policies, can we estimate the time required for convergence? • Do convergence behaviors of ISPs differ? • How does steady-state topology compare to paths explored during failure? • Can we change policies/topology to improve BGP convergence times?

  4. Experiments • Analyzed secondary paths between between 20 source/destination AS pairs • Inject and monitor BGP faults • Survey providers to determine policies behind paths • To provide intuition, we will focus on faults injected into three ISPs at Mae-West • Observed faults via fourth ISP (in Japan) • Three ISPs roughly map onto tier1, tier2, tier3 providers • Results from these three ISPs representative of all data

  5. Comparing ISP Convergence Latencies • CDF of faults injected into three Mae-West providers and observed at Japanese ISP • Significant variations between providers • Not related to geography

  6. Steady State Steady State Steady State FAULT FAULT FAULT R1 R2 R3 ISP 1 ISP 2 ISP 3 Observed Fault Injection Topologies ISP 4 • In steady-state, topologies between ISP1, ISP2, ISP3 similar – all direct BGP peers of ISP4. Does not explain variation on previous slide… MAE-WEST

  7. Factors Impacting BGP Propagation • Topology and policy impact graph (usually DAG) • Each AS router adds between 0-45 seconds of MinRouteAdver Delay • iBGP/Route Reflector • MinRouteAdver and path race conditions affect which routes chosen as backup routes iBGP D C B A

  8. P2 ISP 5 96% Average: 92 (min/max 63/140) seconds Announce AS4 AS5 AS1 (44 seconds) Withdraw (92 seconds) 4% Average: 32 (min/max 27/38) seconds Withdraw (32 seconds) P2 ISP1-ISP4 Paths During Failure ISP 4 • Only one back up path (length 3) Steady State FAULT R1 ISP 1

  9. 63% Average: 79 (min/max 44/208) seconds AS4 AS5 AS2 (35 seconds) Withdraw (79 seconds) 7% Average: 88 (min/max 80/94) seconds Announce AS4 AS5 AS2 (33 seconds) Announce AS4 AS6 AS5 AS2 (61 seconds) Withdraw (88 seconds) 7% Average: 54 (min/max 29/9) seconds Withdraw (54 seconds) 23% Other P4 P3 ISP 13 P4 P2 ISP 6 ISP 12 P3 P4 Vagabond ISP 5 ISP 11 P2 P4 P3 ISP 10 P4 ISP2-ISP4 Paths During Failure ISP 4 Steady State FAULT R2 ISP 2

  10. 36% Average: 110 (min/max 78/135) seconds Announce AS4 AS5 AS (52 seconds) Withdraw (110 seconds) 35% Average: 107 (min/max 91/133) seconds Announce AS4 AS1 AS3 (39 seconds) Announce AS4 AS5 AS3 (68 seconds) Withdraw (107 seconds) 2% Average:140.00 (min/max 120/142) Announce AS4 AS5 AS8 AS7 AS3 (27) Announce AS4 AS5AS9 AS8 AS7 AS3 (86) Withdraw (140 seconds) 27% Other P6 P5 P4 ISP 9 P2 P5 P3 ISP 5 P5 P7 P6 ISP 8 P7 ISP 1 P2 P5 P4 P6 P7 P3 P5 ISP 7 P4 P6 P7 ISP3-ISP4 Paths During Failure ISP 4 Steady State FAULT R3 ISP 3

  11. Why the Different Levels of Complexity? • Provider relationship taxonomy • Transit relationships • customer/provider • customer sends their customer routes • provider sends default-free routing info (or default) • Peer relationships • Bilateral exchange of customer routes • Back-up transit • peer relationship becomes transit relationship based on failure • These relationships constrain topology (no N! states) and determine number of possible backup paths

  12. Convergence in the Real World 3 customer peer 2 1 X 4 5 Longest path: 3 4 5 2 1 Possible paths for node 3: 2 1 x 4 2 1 x (4 5 2 1 x) Possible paths for node 4: 2 1 x 3 2 1 x 5 2 1 x

  13. Convergence in the Real World Hierarchy eliminates some states 3 customer peer 2 1 X 4 5 Tier 1? Longest path: 3 4 5 2 1 Possible paths for node 3: 2 1 x 4 5 2 1 x Possible paths for node 4: 3 2 1 x 5 2 1 x

  14. Policy and Convergence • Strict hierarchical relationships eliminate exploring some extra states • Policy controls the number of possible paths to explore. • But turns out the number of paths does not matter…

  15. Relationship Between Backup Paths and Convergence • Convergence related to length longest possible backup ASPath between two nodes Longest Observed ASPath Between AS Pair

  16. So, what does all of this mean for convergence time? • Convergence time is related to the length of the longest path that needs to be explored • Before fail-over, need to withdraw all alternative paths • This is bounded O(n) by length of the longest alternative path in the system • This longest path is related to policy

  17. Towards Millisecond BGP Convergence Three possible solutions • Entirely new protocol • Turn off MinRouteAdver timer • “Tag” BGP updates • Provide hint so nodes can detect bogus state information

  18. Further Information C. Labovitz, R. Wattenhofer, A. Ahuja, S. Venkatachary, “The Impact of Topology and Policy on Delayed Internet Routing Convergence”. MSR Technical Report (number pending). June, 2000. C. Labovitz, A. Ahuja, A. Bose, F. Jahanian, “Internet Delayed Routing Convergence.” To appear in Proceedings of ACM SIGCOMM. August, 2000. Send email to ipma-support@merit.edu for more information or to participate in the policy survey

More Related