240 likes | 362 Views
Quantifying Path Exploration in the Internet. Ricardo Oliveira, Rafit Izhak-Ratzin, Lixia Zhang, UCLA Beichuan Zhang, UArizona Dan Pei, AT&T Labs -- Research IMC’06, Rio de Janeiro. Motivation. There has been extensive work measuring BGP convergence , however most work:
E N D
Quantifying Path Exploration in the Internet Ricardo Oliveira, Rafit Izhak-Ratzin, Lixia Zhang, UCLA Beichuan Zhang, UArizona Dan Pei, AT&T Labs -- Research IMC’06, Rio de Janeiro
Motivation • There has been extensive work measuring BGPconvergence, however most work: • was done in controlled simulation environments, e.g. [Labovitz’00] • using a small number of beacon-like prefixes, e.g.[Labovitz’00, Labovitz’01, Mao’03] • We did a systematic measurement of path exploration in the operational Internet
Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence
BGP Background and Monitoring • BGP is a path-vector protocol • Collectors gather BGP routing tables + BGP updates e.g. UCLA X=AS52 announcing prefix 131.179/16 X Collector 131.179/16: [X] 131.179/16 : [Y X] Monitor Y 131.179/16 : [Z Y X] Monitor Z 131.179/16 : [Y X]
2 3 W time Relative convergence time What is path exploration? A B Q: What happens if link F-G fails? 3 A: Node E explores 2 paths before declaring G unreachable… C D • Q: Why is this a problem? • Delays andloss of data pkts • Extra router processing 2 E F X G 1 Peer Peer Provider Customer
Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence
Methodology • Data Set: 50 monitors of RV+RIPE and 1 month of data (Jan’06) Raw BGP feed Preprocessing Event Identification Event Classification Timeout T Path Rank Heuristic • Preprocessing: removed session resets; cleaned beacons using anchor prefixes • Event Identification: grouped updates for same (monitor,prefix) across time using relative timeout T • Event Classification: classify events according to explored paths and output of path rank heuristic BGP Beacons were used to calibrate our event identification scheme and evaluated different path rank heuristics
BGP Beacons • Periodic BGP announcements and withdraws that are artificially injected in the network [Mao’03, RIPE] A W A time 2h 2h Beacon Announcement Beacon Withdraw • Used as calibration points: • clean signals: no noise caused by sporadic events • beacon event times are known
Event Identification • A single event can trigger multiple updates • Need to cluster BGP updates along time dimension for each (monitor, prefix) pair • Q: what relative timeout T should we use? A: T=240s (4min)
Event Classification 1 event p1 p2 p3 p4 p5 Final path:p5 Initial path:p0 time p0=p5 p0p5 p0=…=p5 p5>p0 p0>p5 p0= p5=
Classifying Tlong and Tshort events: the problem of path comparision p1 p2 p3 Initial path p0 Final path p3 time 1 event • This event is classified as: • Tshort: if pref(p3) > pref(p0) • Tlong: if pref(p3) < pref(p0) • Because of policy routing, the shorter path is not always the preferred path… • Q: Which path the router prefers: p0 or p3?
Beacons’ Tdown Evaluating Path Rank Heuristics • Extending this method to all prefixes, the accuracy of each heuristic is: • Policy: 17% • Length: 65% • Policy+ Length: 73% • Usage time: 95% • c_right: # of matches with calibration list • c_wrong: # of mismatches Usage time is most accurate heuristic to determine path preference
Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence
Characterizing Events Tshort < Tspath ~ Tup < Tlong << Tdown < Tpdist Tdown convergence time is significantly higher than Tlong convergence time, contrasting with worst case analysis of [Labovitz’01]
Talk Outline • Background on BGP convergence • Measurement methodology • Event characterization • Impact of policy and topology in observed convergence
The impact of policy and topology in observed convergence • How is the convergence process perceived by monitors in different locations in the Internet? Non-MRAI • What about MRAI timer? • BGP RFC specifies that the MRAI should have a base of 30s + jitter between 0.75 and 1 • Not all ISPs follow RFC . . . MRAI
Impact of monitor location on observed convergence • Set of MRAI monitors : 4 core(tier-1), 15 middle(transit) and 3 edge (stub) Convergence time by monitor location : core < middle < edge
Impact of monitor location on observed convergence 1 2 Peer Peer Provider Customer Core 3 4 Middle Edge 5 6 7 • Monitors at lower tiers have more paths to explore
Further breaking down events by originmonitor pair Worst case: edge {edge, middle}
131.179.100/24 131.179/16 C B A The Impact of Tdown Convergence In a Tdown the destination becomes unreachable, therefore we don’t care about routing convergence time … … or do we? Q: What happens when the /24 prefix is withdrawn? A: Routers will experience Tdown convergence, even though the destination is still reachable via the /16 prefix… • According to recent measurements, about 1/3 of prefixes in routing table are in the same scenario as the /24 in this example
Origin of Tdown events Networks in the core are the most stable; edge networks the most unstable (proportion 1:2:3)
Conclusions • Usage time: new path ranking heuristicwhich provides +95% accuracy in determining routers’ path preference • Tdown convergence is by far the longest, even when compared with Tlong • Core-to-core convergence is the fastest case; edge-to-{edge,middle} the slowest • Core networks are three times more stable than edge networks