1 / 30

Transient BGP Loops Do they matter, and what can be done about them?

Transient BGP Loops Do they matter, and what can be done about them?. Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski. MIT. Bob. Joe. AT&T. Sprint. Maintenance. What causes: “Transient BGP Loops”. Withdraw MIT. MIT. Bob. Joe. AT&T. Sprint.

mason
Download Presentation

Transient BGP Loops Do they matter, and what can be done about them?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transient BGP LoopsDo they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

  2. MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Withdraw MIT

  3. MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops”

  4. MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Withdraw MIT

  5. MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Routing Loop

  6. MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Withdraw MIT

  7. MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops”

  8. MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops”

  9. How common are: “Transient Inter-domain Routing Loops” • Sprint Study (IMC 2003, IMW 2002): • Looked at packet traces from the Sprint backbone • Up to 90% of the observed packet-loss was caused by routing loops • 60-100% of the loops attributable to BGP

  10. Routing Loop Damage • Our Study: • 20 vantage points with BGP feeds • 2 Months • 70,000 unique prefixes • Pinged once every 2 minutes • Trace-routed once every 30 minutes • TTL Exceeded responses to detect loops • Additional pings and traceroutes when loops detected

  11. Routing Loop Damage 10-15% of updates cause routing loops

  12. Collateral Damage AS F AS C AS A AS D AS B AS E

  13. Collateral Damage Collateral Damage AS F X AS C AS A AS D AS B AS E

  14. Collateral Damage Prefixes sharing a loopy link see 19% loss

  15. What should be done? We should prevent forwarding loops

  16. A loop occurs because: One AS pushes a route update to the data plane, but other AS's, unaware yet of the move, try to send packets on the old route

  17. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT

  18. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT AT&T still thinks Joe is routing through Bob

  19. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? What if: AT&T knew about Joe’s change before making its own?

  20. Suspension • Continue to route traffic • Tell control system not to propagate the route

  21. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT

  22. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT What if: Joe sends it’s update before changing it’s forwarding table?

  23. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops?

  24. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? And also waits for an Ack from AT&T before updating it’s forwarding table?

  25. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Then we can be sure that AT&T knows about the path change before it happens and will not use the path

  26. MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Instead, AT&T will move immediately to the Sprint path and the loop is avoided.

  27. More Generally • We have proven: • Loops are prevented in the general case • Convergence properties similar to normal BGP • All sorts of good proofs and stuff: • http://nms.lcs.mit.edu/~nkushman/

  28. Your feedback • Clearly: • Planned Maintenance events • 20% of update events caused by planned maintenance • Link up events • What about? • Unplanned Link down events • Trade-off between loss on current path and collateral damage

  29. In Short • Routing loops cause significant performance problems • Even prefixes with no BGP updates are significantly affected by loops • A simple change to BGP can avoid all routing loops

  30. Questions?

More Related