1 / 70

R-BGP: Staying Connected in a Connected World

R-BGP: Staying Connected in a Connected World. Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs. The Problem:. BGP Convergence Causes Packet Loss. When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00]

Gabriel
Download Presentation

R-BGP: Staying Connected in a Connected World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R-BGP: Staying Connected in a Connected World Nate Kushman Srikanth Kandula, Dina Katabi,and Bruce Maggs

  2. The Problem: BGP Convergence Causes Packet Loss • When a route changes, up to 30% packet loss for more than 2 minutes [Labovitz00] • Even domains dual homed to tier 1 providers see many loss bursts on a route change [Wang06] • Even popular prefixes experience losses due to BGP convergence [Wang05] • 50% of VoIP disruptions are highly correlated with BGP updates [Kushman06]

  3. Links, Links Everywhere But Not a Path to Forward! Goal: Ensure ASes stay connected as long as the physical network is connected

  4. We Focus on Forwarding • Don’t worry about BGP’s routing • Ensure forwarding works by forwarding packets on pre-computed failover paths

  5. Why Focus on Forwarding? • Convergence is unlikely to be fast enough • Strict timing constraints limit innovation

  6. Our Contribution Guarantee: No BGP caused packet loss Low Overhead: Just like BGP, each AS advertises at most one path to each neighbor On link failure, we reduce disconnected ASes from 22% to Zero

  7. What Causes Transient Disconnection? AT&T Sprint Peter All of Hari’s providers use him to get to MIT BGP Rule: An AS advertises only its current forwarding path Hari  Nobody offers Hari an alternate path MIT

  8. What Causes Transient Disconnection? AT&T Sprint Peter Hari knows no path to MIT Hari drops Peter and AT&T’s packets in addition to his own Hari LOSS! X Link Down MIT

  9. What Causes Transient Disconnection? Hari withdraws path AT&T Sprint Peter AT&T and Peter move to alternate paths Hari X MIT

  10. What Causes Transient Disconnection? Hari withdraws path AT&T Sprint Peter AT&T and Peter move to alternate paths AT&T announces the Sprint path to Hari  Traffic flows Hari X Transient Packet Loss MIT

  11. How do failover paths solve the problem? BGP: An AS advertises only its current path. It advertises an alternate only after a link fails R-BGP: Advertises an alternate, i.e. failover path, before a link fails

  12. Failover Paths AT&T advertises to Hari “AT&T Sprint  MIT” as a failover path Peter AT&T Sprint Link Fails  Hari immediately sends traffic on failover path Hari No Loss ! X MIT

  13. Two Challenges Challenge 1: Minimize the number of failover paths, while ensuring an AS always has a usable path Challenge 2: Transition from usable path to converged path without creating forwarding loops

  14. Challenge 1: Minimize number of failover paths Claim: Just like BGP, advertise one path per neighbor, either current or failover Current path Current path AT&T Peter Sprint Current path Failover Path Hari Insight: Replace path advertised to downstream AS with a failover path MIT

  15. Which failover path should it advertise? AT&T John x Bob Joe Most Disjoint Path Dest Lemma:Advertising Most Disjoint is equivalent to advertising all paths.

  16. Challenge 1: Minimize number of failover paths R-BGP Rule: Advertise to downstream AS as a failover path the path most disjoint from the current path When a link fails: Theorem 1: The AS upstream of down link knows a failover path if it will know a path at convergence

  17. Challenge 2: Transition without loops AT&T Hari withdraws path Sprint Peter Hari X MIT

  18. Challenge 2: Transition without loops LOOP! AT&T Hari withdraws path Sprint Peter Peter may choose to route through AT&T AT&T may choose to route through Peter Hari X Forwarding Loop! MIT

  19. Challenge 2: Transition without loops Solution 2:Root Cause Information Hari includes Root Cause Information with the withdrawal AT&T Sprint Peter AT&T recognizes the Peter->Hari->MIT path is down Hari->MIT Hari->MIT Link down It routes through Sprint instead Hari X Theorem 2 : No forwarding loops will form MIT

  20. R-BGP Solution 1: Advertise most disjoint path to downstream AS Solution 2: Include Root Cause Information Final Theorem: No AS will see BGP caused packet loss if it will have a path at convergence

  21. Experimental Results

  22. Setup • AS-Level Simulation over the full Internet • AS-graph with 24,142 ASes from Routeviews BGP Data • Use inference algorithm to annotate links with customer-provider or peer relationships

  23. Single Link Failure Results • Dual-homed AS loses one link • Find percentage of ASs that see transient disconnection to the destination • Run for all dual homed ASes X Destination

  24. Single Link Failure Results Percentage of ASes transiently disconnected 22% - BGP Zero - R-BGP R-BGP Eliminates all Transient Disconnection

  25. Cost of Policy Compliance • Most disjoint path may not be compliant with BGP routing policies • Still an AS may want to advertise it: • To protect its own traffic • Because it is temporary What if we choose most-disjoint among policy compliant paths?

  26. Cost of Policy Compliance Percentage of ASes transiently disconnected 22% - BGP Zero - R-BGP

  27. Cost of Policy Compliance Percentage of ASes transiently disconnected 22% - BGP 1.4% - R-BGP: policy compliant Zero - R-BGP Policy compliant failover paths may be sufficient

  28. Multiple Link Failure Results • All proofs are for single link failure • Randomly choose a second link X Destination

  29. Multiple Link Failure Results Percentage of ASes transiently disconnected 22% - BGP 1.4% - R-BGP: policy compliant 0% - R-BGP Multiple link failures are unlikely to interact

  30. Worst Case Scenario • Fail link on current path • Fail link on corresponding failover path X Hari X Destination

  31. Multiple Link Failure Results Percentage of ASes transiently disconnected 33% - BGP

  32. Multiple Link Failure Results Percentage of ASes transiently disconnected 33% - BGP 12% - R-BGP: policy compliant

  33. Worst case Scenario Percentage of ASes transiently disconnected 33% - BGP 12% - R-BGP: policy compliant 7% - R-BGP Eliminates 80% of disconnection even in the worst case of link failures on both current and failover

  34. Conclusion • BGP loses connectivity even when the physical network is connected • R-BGP uses a few failover paths to ensure forwarding works throughout convergence • Guarantees no packet loss • Just like BGP, one path per neighbor • Reduces disconnected ASes from 22% to zero Working with Cisco on prototype feasibility

  35. The End

  36. Multiple Link Failure Results Joe forwards on second best path, not most disjoint Joe X Packets on Bob’s failover path follow Joe’s second best path to the destination Bob X Destination

  37. Practical • Requires only a few modifications to BGP • Currently working with Cisco to prototype • Advertises only one path per neighbor, just like BGP • Convergence time 1/3 that of BGP

  38. Challenge 1: A few Strategic Failover Paths Solution 1: Most Disjoint Path Theorem 1: If any AS using the down link will have a path after convergence, then R-BGP guarantees that the AS immediately above the down link knows a failover path when the link fails.

  39. Link Down No Available Loop Free Path Hari->MIT Link is down Hari->MIT Link is down AT&T can immediately move to Sprint path AT&T Sprint Peter Peter is left without any usable path Peter continues to use the old path Hari Moves away from old path only after receiving advertisement from AT&T Mechanism 3: If no path without the down link is available, continue to use the old path until such a path becomes available or sure that no such path will become available. MIT

  40. Mechanism 1 Mechanism 2 Mechanism 3 Ensure the failover AS knows an alternate path Allow ASes to recognize safe paths that are guaranteed to be loop-free Continue to forward along the old path to the failover AS until a safe path is learned Key Idea: Disconnect forwarding from routing Ensure that forwarding continues to work regardless of what happens at the routing layer Putting it all together

  41. Final Theorem : When a link fails: If an AS will eventually have a path, it will see no BGP caused packet loss

  42. Final Theorem :When a single link fails, all ASs that will eventually learn a valley-free path to the destination are guaranteed no BGP-caused packet loss during convergence A path is valley-free if no AS transits between two non-customers ASs

  43. Little Additional Overhead 22K 20K Less than 10% more updates network wide

  44. Faster Convergence Times 13 4 Convergence times are 1/3 of those with BGP

  45. Compared Schemes • Current BGP • Most-disjoint failover path • Most-disjoint policy-compliant failover path

  46. Goal: Staying Connected If an ASes link to destination fails and After convergence the AS will have a path to destination X The AS should know a failover path to the destination when the link fails Destination

  47. Goal: Staying Connected the AS immediately upstream of a down link can protect all traffic Without a failover path, all ASes see disconnection X Destination The AS upstream of the down link must know a failover path when the link fails

  48. Goal: Staying Connected AS immediately upstream of a down link can protect all traffic If this AS has no failover path, all ASes using link see disconnection X The AS upstream of the down link must know a failover path when the link fails Destination

  49. Challenge 2: Consistency during convergence Routing Loops & ASes unaware of available paths Inconsistency across ASes Strong Consistency Expensive Balance between providing enough consistency while maintaining BGPs scalability

  50. Challenge 1: Which Failover Paths to Advertise AS immediately upstream of a down link can protect all traffic LOSS! If this AS has no failover path, all ASes using link see disconnection X The AS upstream of the down link must know a failover path when the link fails Destination

More Related