250 likes | 491 Views
Staying Connected in a Connected World. Dina Katabi Nate Kushman, Srikanth Kandula, and Bruce Maggs. Losing Connectivity Because of BGP Dynamics. Route changes cause up to 30% packet loss for more than 2 minutes [Labovitz00]
E N D
Staying Connected in a Connected World Dina Katabi Nate Kushman, Srikanth Kandula, and Bruce Maggs
Losing Connectivity Because of BGP Dynamics • Route changes cause up to 30% packet loss for more than 2 minutes [Labovitz00] • Routing events can cause multiple loss bursts, and one loss burst can last for up to 20s [Wang06] • Popular and unpopular prefixes experience losses due to BGP dynamics [Wang05] • VoIP outages are highly correlated with BGP updates [Kushman06]
Links, Links Everywhere But Not a Path to Forward! We keep ASs connected as long as the graph is connected
Focus on Forwarding • Don’t worry about BGP’s routing • During convergence BGP can create transient loops and no-paths • Address transient loops or no-paths by forwarding packets on pre-computed failover paths • This talk describes our solution to the transient no-path problem
Why Forwarding? • Convergence is unlikely to be fast enough • Even a few seconds of disconnectivity affects realtime apps such as VoIP and Gaming • Strict timing constraints limit innovation • E.g., prevent a future BGP that considers path capacity
Transient No-Path Problem AT&T. Sprint. Jen Tim All of Tim’s neighbors are using him to get to MIT Nobody tells Tim an alternate path MIT
Link Down Transient No-Path Problem AT&T. Sprint. Tim knows no alternate path to MIT Jen Tim drops AT&T’s and Jen’s packets to MIT, and his own Tim LOSS! MIT
Link Down Transient No-Path Problem Eventually, Tim withdraws path from AT&T and Jen AT&T. Sprint. Jen AT&T and Jen stop sending packets to Tim Tim MIT
Link Down Transient No-Path Problem Eventually, Tim withdraws path from AT&T and Jen AT&T. Sprint. Jen AT&T and Jen stop sending packets to Tim Tim AT&T announces the Sprint path to Tim & Jen Traffic flows MIT Transient No-Path causes temporary disconnectivity
How do we solve Tim’s problem? Tell Tim a failover path before the link fails rather than after it, as is often the case in current BGP
Link Down Help Tim Help You! AT&T advertises to Tim “AT&T Sprint MIT” as a failover path AT&T. Sprint. Jen Link Fails Tim immediately sends traffic on failover path Tim Internally, AT&T tunnels Tim’s traffic toward Sprint MIT No Loss !
Can AT&T advertise a failover path to every neighbor? No, because: • Excessive overhead • AT&T can’t tell whether packets are for primary or failover path Constraint: An AS can advertise only one failover path, and only to its next-hop AS
Tim Destination Goal: Staying Connected AT&T. Sprint. • If Tim’s link to destination fails and • After convergence Tim will have a path to destination X Tim should have a failover path to the destination when the link fails
How do we achieve the goal given the constraint? We can only pick which failover path an AS advertises to its next-hop AS
AT&T. Jen x Tim Nick Dest The most disjoint path protects against more link failures
Resilient BGP (R-BGP) Each AS advertises to its next-hop AS, a failover path which is the path most disjoint from its primary Theorem 1: If any AS using the down link will have a path after convergence, then R-BGP guarantees that the AS immediately above the down link knows a failover path when the link fails.
Theorem 2: All ASs that will eventually learn a valley-free path to the destination are guaranteed no BGP-caused packet loss during convergence A path is valley-free if no AS transits between two non-customers ASs
Experimental Results • Event-driven simulation • Dual-homed AS loses one link • Find percentage of ASs that see temporary disconnectivity to the dual-homed AS • AS-graph from Routeviews X MIT • State-of-the-art policy inference based on [Xia04] and [Subramanian02]
Compared Schemes • Current BGP • Most-disjoint failover path • Most disjoint path may not be policy compliant. Still an AS may want to advertise it because: • It is temporary • The AS protects its own traffic
Compared Schemes • Current BGP • Most-disjoint failover path • Most-disjoint policy-compliant failover path
Results Percentage of ASs with transient disconnectivity 9% with current BGP 0% With most-disjoint path
Results Percentage of ASs with transient disconnectivity 9% with current BGP 0.5% with policy-compliant most-disjoint path 0% With most-disjoint path Policy compliant failover paths may be sufficient
Conclusion • BGP loses connectivity even when the graph is connected • R-BGP solves this problem by advertising a single failover path downstream • BGP’s convergence stays unaffected • Simple and powerful