300 likes | 397 Views
Transient BGP Loops Do they matter, and what can be done about them?. Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski. MIT. Bob. Joe. AT&T. Sprint. Maintenance. What causes: “Transient BGP Loops”. Withdraw MIT. MIT. Bob. Joe. AT&T. Sprint.
E N D
Transient BGP LoopsDo they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski
MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Withdraw MIT
MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops”
MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Withdraw MIT
MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Routing Loop
MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops” Withdraw MIT
MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops”
MIT Bob Joe AT&T Sprint Maintenance What causes: “Transient BGP Loops”
How common are: “Transient Inter-domain Routing Loops” • Sprint Study (IMC 2003, IMW 2002): • Looked at packet traces from the Sprint backbone • Up to 90% of the observed packet-loss was caused by routing loops • 60-100% of the loops attributable to BGP
Routing Loop Damage • Our Study: • 20 vantage points with BGP feeds • 2 Months • 70,000 unique prefixes • Pinged once every 2 minutes • Trace-routed once every 30 minutes • TTL Exceeded responses to detect loops • Additional pings and traceroutes when loops detected
Routing Loop Damage 10-15% of updates cause routing loops
Collateral Damage AS F AS C AS A AS D AS B AS E
Collateral Damage Collateral Damage AS F X AS C AS A AS D AS B AS E
Collateral Damage Prefixes sharing a loopy link see 19% loss
What should be done? We should prevent forwarding loops
A loop occurs because: One AS pushes a route update to the data plane, but other AS's, unaware yet of the move, try to send packets on the old route
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT AT&T still thinks Joe is routing through Bob
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? What if: AT&T knew about Joe’s change before making its own?
Suspension • Continue to route traffic • Tell control system not to propagate the route
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Withdraw MIT What if: Joe sends it’s update before changing it’s forwarding table?
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops?
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? And also waits for an Ack from AT&T before updating it’s forwarding table?
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Then we can be sure that AT&T knows about the path change before it happens and will not use the path
MIT Bob Joe AT&T Sprint Maintenance How can we avoid Routing Loops? Instead, AT&T will move immediately to the Sprint path and the loop is avoided.
More Generally • We have proven: • Loops are prevented in the general case • Convergence properties similar to normal BGP • All sorts of good proofs and stuff: • http://nms.lcs.mit.edu/~nkushman/
Your feedback • Clearly: • Planned Maintenance events • 20% of update events caused by planned maintenance • Link up events • What about? • Unplanned Link down events • Trade-off between loss on current path and collateral damage
In Short • Routing loops cause significant performance problems • Even prefixes with no BGP updates are significantly affected by loops • A simple change to BGP can avoid all routing loops