1 / 29

Navigating BGP Complexity: Pitfalls, Misconfigurations, and Inefficiencies

Explore the complexities of BGP protocol, from misconfigurations affecting connectivity to inefficiencies in routing, with insights on performance, reliability, and more.

dfeldman
Download Presentation

Navigating BGP Complexity: Pitfalls, Misconfigurations, and Inefficiencies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BGP Inefficiencies Supplemental slides 02/14/2007 Aditya Akella

  2. BGP Complexity • BGP is a very complicated protocol • Too many knobs • Need to accommodate (sub-optimal) ISP policies • Requires complex, human configuration • For all its complexity, BGP offers no guarantees • Performance?? • Reliability?? • Correctness?? • Reachability?? • All of BGPs complexity begets… Headache!

  3. BGP Pitfalls and Problems • Pitfalls and problems • Misconfiguration • Convergence • Performance • Reliability • Stability • Security • And the list goes on…

  4. Favorite Scapegoat! Networkingcommunity BGP

  5. Misconfiguration [Mahajan02Sigcomm] • Origin misconfiguration: accidentally inject routes for prefixes into global BGP tables

  6. Misconfiguration • Export misconfiguration: export route to a peer in violation of policy

  7. Interesting Observations • Origin misconfig • 72% of new routes may be misconfig • 11-13% of misconfig incidents affect connectivity • Pings and e-mail checks • Self de-aggregation is the main cause • Export misconfig • Upto 500 misconfiguration incidents per day • All forms are prevalent, although provider-AS-provider is more likely

  8. Effects and Causes Export Misconfig • Effects • Routing load • Connectivity disruption • Extra traffic • Policy violation • Causes (Origin misconfig) • Router vendor software bugs: announce and withdraw routes on reboot • Reliance on upstream filtering • New configuration not saved to stable storage (separate command and no autosave!) • Hijacks of address spaces • Forgotten to install filter • Human operators and poor interface P1 P2 A C • Intended policy: Provide transit to C through link A-C • Configured policy: Export all routes originated by C to P1 and P2 • Correct policy: export only when AS path is “C”

  9. BGP Convergence [Labovitz00Sigcomm] • Conventional beliefs • Path vector converges faster than traditional DV (eliminates the count to infinity problem) • Internet path restoration takes order of 10s of seconds • Convergence • Recovery after a fault may take as much as ten minutes • Single routing fault could result in multiple announcements and withdrawals • Loss and RTT around times of faults are much worse • Upon route withdrawal, explore paths of increasing length • In the worst case, could explore n! paths • Depends which messages are processed and when • Limit between update message could reduce messages • Forces all outstanding messages to be processed

  10. End-to-End Routing Behavior [Paxson96Sigcomm] • Large scale routing behavior as seen by end-hosts, based on analysis of traceroutes • Pathologies: persistent routing loops, routing failures and long connectivity outages • Stability: 9% or routes changed every 10s of minutes, 30% about ~6hrs and 68% took a few days • Symmetry: more than half of paths probed were asymmetric at router level

  11. Inefficiencies in BGP &Internet Routing • Route convergence and oscillations • Poor reliability • No way to exploit redundancy in Internet paths • Inefficiency: sub-optimal RTTs and throughputs • What are some of the causes? • Policies in routing: Inter-domain and Intra-domain • Lack of direct routes, “sparseness” of the Internet graph

  12. Inefficiency of Routes [Spring03Sigcomm] • Three classes of reasons for poor performance (“inflation”) • Intra-domain topology and policy • Topology: no direct link between all cities • Routing policy: “shortest paths” may be avoided due to engineering • ISP Peering • Peeering topology: limited peering between ISPs • Peering policy: hot-potato routing or early-exit routing • Inter-domain • Topology: AS graph is sparse • Inter-domain policies: policies are policies

  13. Path Inflation Summary

  14. Internet Bottlenecks As access technology improves… Non-access or Wide-Area Bottlenecks? Last-mile, slow access links limit transfer bandwidth High-speed “core” Big, fatPipe(s) Slow, flaky home connection 100Mbps home connection Most bottlenecks are last-mile

  15. Wide-Area Bottlenecks Wide-area bottleneck  where an unconstrained TCP flow sees delays and losses Not the “traditional” bottlenecks  may not be congested Link with the least available bandwidth Very Small ISP Very Small ISP Tiny ISP Unconstrained TCP flow Wide-Area Internet/High-speed “core” Small ISP Small ISP Small ISP ATT Very Small ISP Sprint UUNet Small ISP Tiny ISP SmallISP Tiny ISP

  16. Measurement Tool: BFind But no control over destination Emulate the whole processfrom the source! Ideally… dest source Monitor queues, identify where queues build up bottleneck

  17. Measurement Tool: BFind Round 1 Round 2 Round j • BFind functions like TCP: gradually increase send rate until hits bottleneck • Can identify key properties of the bottleneck • Location, latency, available bandwidth (== send rate of BFind before quitting) • Single-ended control • Quits after 180s and before send rate hits 50Mbps • Bfind validation: wide-area experiments and simulations 1Mbps Flag #2, keep curent rate for round j+1 force queueing Rate for round 2:1+d Mbps Rate for round 3: 1+2d Mbps Rate controlled UDP stream Round j:Queueing on #2! Round 2:No queueing! Round 1:No queueing! dest source Rounds ofTraceroutes If #2 flagged too many times  quit. Identify #2 as bottleneck Monitor links forqueueing Report toUDP process

  18. Results: Location Intra-ISP links Inter-ISP links 51% 49% One of the two peering links with 50% chance %bottlenecks %all links %bottlenecks %all links Peering Link Probability of being the bottleneck = 0.25 Intra-ISP Link Probability of being the bottleneck = 0.125 One of the four non-peering links with 50% chance

  19. Results: Available Bandwidth Intra-ISP links Inter-ISP links • Tier-1 –1 peering is the best • Peering involving tiers-2,3 similar • Tier-1 ISPs are the best • Tier-3 ISPs have slightly higher available bandwidth than tier-2

  20. Performance: End-to-End Perspective • From an end-to-end view… • Is there a way of extracting better performance? • Is there scope? • How do we realize this? • Scope: Savage99, CMU Multihoming work • Reality: UW’s “Detour” system, MIT’s RON, Akamai’s SureRoute, CMU’s Route Control implementation

  21. Quantifying Performance Loss [Savage99Sigcomm] • Measure round trip time (RTT) and loss rate between pairs of hosts • Alternate path characteristics • 30-55% of hosts had lower latency • 10% of alternate routes have 50% lower latency • 75-85% have lower loss rates

  22. Bandwidth Estimation • RTT & loss for multi-hop path • RTT by addition • Loss either worst or combine of hops – why? • Large number of flows combination of probabilities • Small number of flows worst hop • Bandwidth calculation • TCP bandwidth is based primarily on loss and RTT • 70-80% paths have better bandwidth • 10-20% of paths have 3x improvement

  23. Possible Sources of Alternate Paths • A few really good or bad AS’s • No, benefit of top ten hosts not great • Better congestion or better propagation delay? • How to measure? • Propagation = 5th percentile of delays • Both contribute to improvement of performance

  24. Overlay Networks • Basic idea: • Treat multiple hops through IP network as one hop in overlay network • Run routing protocol on overlay nodes • Why? • For performance – like the Savage 99 paper showed • For efficiency – can make core routers very simple • E.g. CSFQ, • Also aid deployment. E.g. Active networks • For functionality – can provide new features such as multicast, active processing

  25. Future of Overlay • Application specific overlays • Why should overlay nodes only do routing? • Caching • Intercept requests and create responses • Transcoding • Changing content of packets to match available bandwidth • Peer-to-peer applications

  26. Overlay Challenges • “Routers” no longer have complete knowledge about link they are responsible for • How do you build efficient overlay • Probably don’t want all N2 links – which links to create? • Without direct knowledge of underlying topology how to know what’s nearby and what is efficient? • Do we need overlays for performance?

  27. Number of Route Choices • Flexible control of end-to-end path many route choices Multiple candidatepaths Single path Multiple BGPpaths • BGP: one path via each ISP  choices linked to #ISPs Few more route choices…?

  28. Route Selection Mechanism • BGP: simple, coarse metrics such as least AS hops, policy Best performingpath Least AS hops Policy compliant Current best performingBGP path • Smartselection “Multihoming route control” • Overlays: complex, performance-oriented selection Sophisticated selection among multiple BGP routes

  29. Overlay Routing vs. Multihoming Route Control Route Control Overlay Routing Overlay provider $$ Genuity $$ Sprint $$ ATT $$ ATT $$ Connectivity fees Connectivity fees + overlay fee Announce/20 sub-blocks to ISPs Overlay nodeforces inter-mediate ISP to provide transit If all multihomed ends do this /18 netblock Routing table expansion Bad interactions with policies

More Related