700 likes | 1.1k Views
R-BGP: Staying Connected in a Connected World. Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs. The Problem:. BGP Convergence Causes Packet Loss. When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00]
E N D
R-BGP: Staying Connected in a Connected World Nate Kushman Srikanth Kandula, Dina Katabi,and Bruce Maggs
The Problem: BGP Convergence Causes Packet Loss • When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00] • Even domains dual homed to tier 1 providers see many loss bursts on a route change [Wang06] • Even popular prefixes experience losses due to BGP convergence [Wang05] • 50% of VoIP disruptions are highly correlated with BGP updates [Kushman06]
Links, Links Everywhere But Not a Path to Forward! Goal: Ensure ASes stay connected as long as the physical network is connected
We Focus on Forwarding • Don’t worry about BGP’s routing • Ensure forwarding works by forwarding packets on pre-computed failover paths
Why Focus on Forwarding? • Convergence is unlikely to be fast enough • Strict timing constraints limit innovation
Our Contribution Guarantee: No BGP caused packet loss Low Overhead: Just like BGP, each AS advertises at most one path to each neighbor On link failure, we reduce disconnected ASes from 22% to Zero
What Causes Transient Disconnection? AT&T Sprint Peter All of Hari’s providers use him to get to MIT BGP Rule: An AS advertises only its current forwarding path Hari Nobody offers Hari an alternate path MIT
What Causes Transient Disconnection? AT&T Sprint Peter Hari knows no path to MIT Hari drops Peter and AT&T’s packets in addition to his own Hari LOSS! X Link Down MIT
What Causes Transient Disconnection? Hari withdraws path AT&T Sprint Peter AT&T and Peter move to alternate paths Hari X MIT
What Causes Transient Disconnection? Hari withdraws path AT&T Sprint Peter AT&T and Peter move to alternate paths AT&T announces the Sprint path to Hari Traffic flows Hari X Transient Packet Loss MIT
How do failover paths solve the problem? BGP: An AS advertises only its current path. It advertises an alternate only after a link fails R-BGP: Advertises an alternate, i.e. failover path, before a link fails
Failover Paths AT&T advertises to Hari “AT&T Sprint MIT” as a failover path Peter AT&T Sprint Link Fails Hari immediately sends traffic on failover path Hari No Loss ! X MIT
Two Challenges Challenge 1: Minimize the number of failover paths, while ensuring an AS always has a usable path Challenge 2: Transition from usable path to converged path without creating forwarding loops
Challenge 1: Minimize number of failover paths Claim: Just like BGP, advertise one path per neighbor, either current or failover Current path Current path AT&T Peter Sprint Current path Failover Path Hari Insight: Replace path advertised to downstream AS with a failover path MIT
Which failover path should it advertise? AT&T John x Bob Joe Most Disjoint Path Dest Lemma:Advertising Most Disjoint is equivalent to advertising all paths.
Challenge 1: Minimize number of failover paths R-BGP Rule: Advertise to downstream AS as a failover path the path most disjoint from the current path When a link fails: Theorem 1: The AS upstream of down link knows a failover path if it will know a path at convergence
Challenge 2: Transition without loops AT&T Hari withdraws path Sprint Peter Hari X MIT
Challenge 2: Transition without loops LOOP! AT&T Hari withdraws path Sprint Peter Peter may choose to route through AT&T AT&T may choose to route through Peter Hari X Forwarding Loop! MIT
Challenge 2: Transition without loops Solution 2:Root Cause Information Hari includes Root Cause Information with the withdrawal AT&T Sprint Peter AT&T recognizes the Peter->Hari->MIT path is down Hari->MIT Hari->MIT Link down It routes through Sprint instead Hari X Theorem 2 : No forwarding loops will form MIT
R-BGP Solution 1: Advertise most disjoint path to downstream AS Solution 2: Include Root Cause Information Final Theorem: No AS will see BGP caused packet loss if it will have a path at convergence
Setup • AS-Level Simulation over the full Internet • AS-graph with 24,142 ASes from Routeviews BGP Data • Use inference algorithm to annotate links with customer-provider or peer relationships
Single Link Failure Results • Dual-homed AS loses one link • Find percentage of ASs that see transient disconnection to the destination • Run for all dual homed ASes X Destination
Single Link Failure Results Percentage of ASes transiently disconnected 22% - BGP Zero - R-BGP R-BGP Eliminates all Transient Disconnection
Cost of Policy Compliance • Most disjoint path may not be compliant with BGP routing policies • Still an AS may want to advertise it: • To protect its own traffic • Because it is temporary What if we choose most-disjoint among policy compliant paths?
Cost of Policy Compliance Percentage of ASes transiently disconnected 22% - BGP Zero - R-BGP
Cost of Policy Compliance Percentage of ASes transiently disconnected 22% - BGP 1.4% - R-BGP: policy compliant Zero - R-BGP Policy compliant failover paths may be sufficient
Multiple Link Failure Results • All proofs are for single link failure • Randomly choose a second link X Destination
Multiple Link Failure Results Percentage of ASes transiently disconnected 22% - BGP 1.4% - R-BGP: policy compliant 0% - R-BGP Multiple link failures are unlikely to interact
Worst Case Scenario • Fail link on current path • Fail link on corresponding failover path X Hari X Destination
Multiple Link Failure Results Percentage of ASes transiently disconnected 33% - BGP
Multiple Link Failure Results Percentage of ASes transiently disconnected 33% - BGP 12% - R-BGP: policy compliant
Worst case Scenario Percentage of ASes transiently disconnected 33% - BGP 12% - R-BGP: policy compliant 7% - R-BGP Eliminates 80% of disconnection even in the worst case of link failures on both current and failover
Conclusion • BGP loses connectivity even when the physical network is connected • R-BGP uses a few failover paths to ensure forwarding works throughout convergence • Guarantees no packet loss • Just like BGP, one path per neighbor • Reduces disconnected ASes from 22% to zero Working with Cisco on prototype feasibility
Multiple Link Failure Results Joe forwards on second best path, not most disjoint Joe X Packets on Bob’s failover path follow Joe’s second best path to the destination Bob X Destination
Practical • Requires only a few modifications to BGP • Currently working with Cisco to prototype • Advertises only one path per neighbor, just like BGP • Convergence time 1/3 that of BGP
Challenge 1: A few Strategic Failover Paths Solution 1: Most Disjoint Path Theorem 1: If any AS using the down link will have a path after convergence, then R-BGP guarantees that the AS immediately above the down link knows a failover path when the link fails.
Link Down No Available Loop Free Path Hari->MIT Link is down Hari->MIT Link is down AT&T can immediately move to Sprint path AT&T Sprint Peter Peter is left without any usable path Peter continues to use the old path Hari Moves away from old path only after receiving advertisement from AT&T Mechanism 3: If no path without the down link is available, continue to use the old path until such a path becomes available or sure that no such path will become available. MIT
Mechanism 1 Mechanism 2 Mechanism 3 Ensure the failover AS knows an alternate path Allow ASes to recognize safe paths that are guaranteed to be loop-free Continue to forward along the old path to the failover AS until a safe path is learned Key Idea: Disconnect forwarding from routing Ensure that forwarding continues to work regardless of what happens at the routing layer Putting it all together
Final Theorem : When a link fails: If an AS will eventually have a path, it will see no BGP caused packet loss
Final Theorem :When a single link fails, all ASs that will eventually learn a valley-free path to the destination are guaranteed no BGP-caused packet loss during convergence A path is valley-free if no AS transits between two non-customers ASs
Little Additional Overhead 22K 20K Less than 10% more updates network wide
Faster Convergence Times 13 4 Convergence times are 1/3 of those with BGP
Compared Schemes • Current BGP • Most-disjoint failover path • Most-disjoint policy-compliant failover path
Goal: Staying Connected If an ASes link to destination fails and After convergence the AS will have a path to destination X The AS should know a failover path to the destination when the link fails Destination
Goal: Staying Connected the AS immediately upstream of a down link can protect all traffic Without a failover path, all ASes see disconnection X Destination The AS upstream of the down link must know a failover path when the link fails
Goal: Staying Connected AS immediately upstream of a down link can protect all traffic If this AS has no failover path, all ASes using link see disconnection X The AS upstream of the down link must know a failover path when the link fails Destination
Challenge 2: Consistency during convergence Routing Loops & ASes unaware of available paths Inconsistency across ASes Strong Consistency Expensive Balance between providing enough consistency while maintaining BGPs scalability
Challenge 1: Which Failover Paths to Advertise AS immediately upstream of a down link can protect all traffic LOSS! If this AS has no failover path, all ASes using link see disconnection X The AS upstream of the down link must know a failover path when the link fails Destination