180 likes | 196 Views
Study on the activity of BGP prefixes over a 3-year period to understand highly active prefixes, their causes, and impact on routing stability. Analysis of BGP log data to identify patterns and suggest future enhancements.
E N D
Measurement of Highly Active Prefixes in BGP Ricardo V. Oliveira, Rafit Izhak-Ratzin, Beichuan Zhang, Lixia Zhang GLOBECOM’05
Motivation and Goals • Previous Internet routing measurement studies ([Rexford’02],[Broido’02] and [Wang’02]) observed the existence of a small number of prefixes which contributed a large number of routing updates • However those observations were made at specific ISPs and over short time periods • Question: Is this a common phenomenon in the Internet or specific to individual ISPs and limited time periods? • We conducted a systematic study of prefix activity by analyzing BGP log data over a 3-year period
Internet and Autonomous Systems AS X AS Y • Autonomous System: a set of routers or networks under the same administration • Border routers exchange routing updates via Border Gateway Protocol AS Z
P C A P A P A BGP and Network Dynamics • Link C-A fails • Node C sends a withdraw to node D • Node D sends announcement to all neighbors except B; to B it sends a withdraw (poison) • Node C sends withdraw to node D (poison) INTERNET A(P, [ D B A ]) D W(P) W(P) A(P, [ D B A ]) B C A If link C-A is unstable, multiple updates will be generated ... Prefix P=131.179.0.0/16
How to capture the unstable prefixes? • Divide time in 1-day slots • Count the number of updates associated with each BGP prefix in each slot • Introduce Activity FunctionA(d,P): • Where: • Nu(d,P) : number of updates on day d for prefix P • Tu : activity threshold A prefix P is highly active (HA) in day d if A(d,P)=1
How to obtain Tu? Be conservative: take the worst case 99th percentile
Prefix Activity (Sprint router) Number of HA prefixes roughly bounded between 100 and 200 per day
Prefix Activity Across Different Monitors 33 monitors, 90% confidence interval
Prefix Activity Across Different Monitors (cont’d) 31 days of May 2004, 95% confidence intervals
HA Life Time We define Life Time as the total number of days during which a prefix is active: With D = 1040 days
Cause #1: Sporadic link failures • In April 13 2004, one of Internet2 routers experienced several outages in a short time period • This router had direct connections to some of RouteViews monitors • Using LinkRank (http://linkrank.cs.ucla.edu), we discover that one monitor switched paths for aprox. 1,500 prefixes • The BGP updates caused by these path changes made these prefixes to appear as HA • We believe this case represents most of the HA cases, as more than 75% of HA prefixes have a lifetime of only one day
A(P) W(P) A(P) W(P) . . . time 2 h 2 h 2 h Cause #2: BGP Path Exploration • A BGP router may try several backup paths before converging to a stable route (path exploration) • How to measure path exploration? • Beacon prefixes: periodic announcements and withdraws of prefixes; root cause is known and we know how many updates to expect if there was no path exploration… 12 BGP updates/day for each router
30 sec Took almost 3 min Cause #2: Path Exploration (cont’d) • Beacon 195.80.227.0/24 seen at one monitor: Time(s) Type AS_PATH 0 W 7,114 A 1239 3257 3257 28747 12654 7,144 A 1239 8928 25232 12654 14,254 A 1239 3356 25232 12654 14,280 A 1239 701 6762 12654 14,337 A 1239 701 6762 12654 (community change) 14,362 A 1239 7018 8220 513 3320 702 13030 12654 14,397 A 1239 7018 8220 513 3320 702 13030 12654 14,420 W
Cause #3: Router (mis)configurations • BGP has two built-in mechanisms to reduce the instability caused by update surges: MRAI and Route Flap Damping • Not all routers have these mechanisms configured by default… • Juniper routers don’t have MRAI timer configured by default and Cisco routers don’t have Route Flap Damping configured by default • A /24 prefix was HA for 12 consecutive days with 6011 updates/day; in one of the days there was 12,000 updates for this prefix; we discover that this router didn’t have the MRAI timer configured
Conclusion • The existence of HA prefixes is a common phenomenon • Observed everywhere • Observed all the time • Causes • Mainly sporadic link failures (75% of the cases) • Slow convergence • Lack of strict adherence to the existing protocol mechanisms (mainly MRAI timer and route dampening) • Future work: • Further identification of the causes • Investigation of solutions • Creation of tool for automatic detection of HA prefixes