210 likes | 323 Views
Modeling Inter-Domain Routing Protocol Dynamics ISMA 2000 December 6, 2000. Craig Labovitz Merit Network/Microsoft Research labovit@merit.edu. In collaboration with Abha, Ahuja, Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi. Routing Dynamics.
E N D
Modeling Inter-Domain Routing Protocol DynamicsISMA 2000December 6, 2000 Craig Labovitz Merit Network/Microsoft Research labovit@merit.edu In collaboration with Abha, Ahuja, Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi
Routing Dynamics Goal: Develop a model of Internet inter-domain routing protocol dynamics. Easy, right? Subgoals • Model impact of failures and topological changes on end-to-end paths • Predict/measure reliability of inter-AS links, routers, etc. • Compare steady-state topology compare to topologies under failure • Figure out where all of those darn BGP updates come from
Stuff • Old stuff • Measurements of BGP updates and convergence • Model BGP convergence (upper and lower bounds) • New Stuff • Protocol timer trade-offs • Improvements to BGP (BGP-CT)
Data Sets & Tools • Default-free BGP peering sessions • (routeviews.merit.edu, 2 Equinix probes, 1 Mae-West, several iBGP probes, Merit RSNG route servers) • Daily tables and all BGP updates/events sent to RS over last five years • Daily default-free dumps (and all updates/events) for 20-30 peers for last two years • Fault injection probes (OSPF/BGP) • Analysis/Tools • MRT/Perl (playing with SSFNet) • RouteTracker (whois.routetracker.net)
Internet BGP Update Volume • Withdraws in millions until 2/1998 due to withdraw looping/Cisco bug. Dramatic drop after IOS release • Announcements growing after 6/98 due to MED policy and convergence?
MTTF of Backbone Networks • Informally: How long before a network is unreachable? • Majority of Internet routes unreachable within 30 days
Mean Time to Fail-Over • How long before traffic is re-routed? • Majority of Internet routes which possess backup paths fail-over every 3 days
Internet Route Repair • How long before a network is reachable again? • Long-tailed distribution with plateau at 30 minutes. Why this plateau?
BGP Convergence • If complete graph, N! upper theoretic bound and 30*(N-3) lower bound • In practice, Internet has hierarchy and customer/provider/sibling relationships. Bounded by length longest possible path
R AS2 AS3 AS0 AS1 *B R via 3 B R via 13 B R via 23 *B R via 3 B R via 03 B R via 23 *B R via 3 B R via 03 B R via 13 * * * *B R via 013 B R via 103 *B R via 203 AS0 AS1 AS2 BGP Convergence Example
Steady State Steady State Steady State Withdraw Withdraw Withdraw R1 R2 R3 ISP 1 ISP 2 ISP 3 Observed Fault Injection Topologies ISP 4 • In steady-state, topologies between ISP1, ISP2, ISP3 similar – all direct BGP peers of ISP4. • Repeatedly withdrew single-homed route (R1, R2, R3) MAE-WEST
Comparing ISP Convergence Latencies • CDF of faults injected into three Mae-West providers and observed at ISP router in Japan • Significant variations between providers
P2 ISP 5 96% Average: 92 (min/max 63/140) seconds Announce AS4 AS5 AS1 (44 seconds) Withdraw (92 seconds) 4% Average: 32 (min/max 27/38) seconds Withdraw (32 seconds) P2 ISP1-ISP4 Paths During Failure ISP 4 • Only one back up path (length 3) Steady State FAULT R1 ISP 1
63% Average: 79 (min/max 44/208) seconds AS4 AS5 AS2 (35 seconds) Withdraw (79 seconds) 7% Average: 88 (min/max 80/94) seconds Announce AS4 AS5 AS2 (33 seconds) Announce AS4 AS6 AS5 AS2 (61 seconds) Withdraw (88 seconds) 7% Average: 54 (min/max 29/9) seconds Withdraw (54 seconds) 23% Other P4 P3 ISP 13 P4 P2 ISP 6 ISP 12 P3 P4 Vagabond ISP 5 ISP 11 P2 P4 P3 ISP 10 P4 ISP2-ISP4 Paths During Failure ISP 4 Steady State FAULT R2 ISP 2
36% Average: 110 (min/max 78/135) seconds Announce AS4 AS5 AS (52 seconds) Withdraw (110 seconds) 35% Average: 107 (min/max 91/133) seconds Announce AS4 AS1 AS3 (39 seconds) Announce AS4 AS5 AS3 (68 seconds) Withdraw (107 seconds) 2% Average:140.00 (min/max 120/142) Announce AS4 AS5 AS8 AS7 AS3 (27) Announce AS4 AS5AS9 AS8 AS7 AS3 (86) Withdraw (140 seconds) 27% Other P6 P5 P4 ISP 9 P2 P5 P3 ISP 5 P5 P7 P6 ISP 8 P7 ISP 1 P2 P5 P4 P6 P7 P3 P5 ISP 7 P4 P6 P7 ISP3-ISP4 Paths During Failure ISP 4 Steady State FAULT R3 ISP 3
Race Conditions and Paths • T(shortest path) <= Tdown <= T(longest path) A B
Relationship Between Backup Paths and Convergence • Convergence related to length of longest possible backup ASPath between two nodes Longest Observed ASPath Between AS Pair
Towards Fast BGP Convergence Four possible solutions • No transit/One-hop topology (peer and filter everyone) • Turn off/Change MinRouteAdver timer • “Tag” BGP updates and provide hint so nodes can detect bogus state information • Entirely new protocol
BGP-CT • Incremental addition to BGP4 • Capability negotiation • Tags carried in as multi-protocol NRLI extension • Invalidate alternative paths if match tag (and other necessary conditions met) • Details • New state machine additions (temporary invalidation) • Works with iBGP • Implemented MRT and deployed on CAIRN • Improves BGP convergence by an order of magnitude in most cases (in a few cases, behavior is worse)