290 likes | 447 Views
Delayed Internet Routing Convergence. Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali. Introduction. Conventional Wisdom - Rapid restoration and rerouting in the event of link or router failure. Actual convergence time of the order of minutes!!
E N D
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali
Introduction • Conventional Wisdom - Rapid restoration and rerouting in the event of link or router failure. • Actual convergence time of the order of minutes!! • What happens to the data packets till then? • Loss of connectivity • Packet Loss • Latency
Infrastructure • Used both passive data collection and fault-injection machines. • Data collected over a 2 year period. • Injected over 250,000 routing faults from diverse locations. • Used RouteView probes to monitor BGP updates in core internet routers. • Active probe machines measured end-to-end performance by sending ICMP echo messages to random web sites.
Taxonomy Tup : A previously unavailable route is announced as available. Tdown : A previously available route is withdrawn. Tshort : An active route with a long ASPath is implicitly replaced by a new route with a shorter ASPath. Tlong : An active route with a long ASPath is implicitly replaced by a new route with a shorter ASPath.
Routing Measurements Latency Vs Number of BGP updates
Observations • Long Tailed distribution. • 20% of Tlong and 40% of Tdown take more than 3 minutes to converge. • (Tshort, Tup) and (Tlong, Tdown) form equivalence classes. • A 20 second separation between Tlong and Tdown. • Tdown and Tlong had twice as many update messages as Tshort and Tup. • Strong correlation between number of updates and latency.
Routing Measurements Latency Vs Type of BGP update
Observations • Significant variation in convergence latencies for the ISPs. • No correlation between convergence latency and geographic or network distance. • Factors contributing to Internet fail-over delay are independent of network load and congestion.
Observations Packet Loss Vs Type of BGP update • Less than 1% packet loss throughout the 10 minute period. • Tlong event has 17% and Tshort event has 32% packet loss. • Wider curve of Tlong due to the slower speed of routing table convergence.
Observations Latency Vs Type of BGP update • Wider curve of Tlong due to the slower speed of routing table convergence. • Tup event had all it’s packet within 1 minute.
BGP Convergence Upper Bound on Convergence
Assumptions • Each AS is a single node. • We have a complete graph of Ases. • Exclude the analysis of MinRouteAdver. • Model the BGP processing as a single linear, global queue.
BGP Convergence Upper Bound on Convergence
Results • Loop detection, if performed at both sender and receiver side, all mutual dependencies could be discovered and eliminated in a single round. • Convergence Latency is independent of geographic and network distance. • These variations are directly related to topological factors like the length and number of possible paths between ASes.
harpal: vbfdsvdjn The Impact of Internet Policy and Topology on Delayed Routing Convergence Craig Labowitz, Roger Wattenhofer, Srinivasan Venkatachary and Abha Ahuja
Major Results • Internet fail-over convergence = , where n is the length of the longest backup path between source and destination. • Customers of bigger ISPs exhibit faster convergence. • Errant paths are frequently explored during delayed convergence.
Methodology • Inject BGP route transitions into more than 10 geographically and topologically diverse providers. • A set of probe machines actively injected faults at random intervals of roughly 2 hours. • Generated faults over a six month period. • Treated the address space as a customer wrt to policy and filtering by the cooperating providers. • Logged periodic routing table snapshots and all BGP updates from additional 20 ISPs.
Inter-provider Relationships Peer : Bilateral exchange of customer and backbone routing information. Routes learnt from other peers and upstream providers are not exchanged. Customer/Transit : The customer announces its backbone and downstream routes to an upstream provider. Backup transit : A peer relationship in which a provider only provides transit after detection of a fault. Both are peers in steady-state but after a failure, the backup transit peer begins advertising its now downstream peer’s backbone and customer routes.
Conclusions • Vagabond paths are responsible for delays in convergence. • The more densely the router is peered, the more time it takes to converge. • MinRouteAdver responsible for significant additional latency during delayed convergence.
Observations • Long-tailed distribution due to vagabond paths. • ISP3 exhibits significantly slower convergence times. • Average convergence latency for a route failure corresponds to the longest possible backup path allowed by policy and topology.
Observations(contd.) Latency Vs Longest ASPath explored
Observations(contd.) Provider Type Vs Observed ASPath length
Conclusions • Customers sensitive to fail-over latency should multi-home to larger providers. • Smaller providers should limit their number of transit and backup transit interconnections. • A large number of vagabond paths suggest a need for a better route validation and authentication mechanism.