300 likes | 417 Views
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance. Feng Wang 1 , Zhuoqing Morley Mao 2 Jia Wang 3 , Lixin Gao 1 , Randy Bush 4. 1 University of Massachusetts, Amherst 2 University of Michigan 3 AT&T Labs-Research 4 Internet Initiative Japan.
E N D
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang1, Zhuoqing Morley Mao2 Jia Wang3, Lixin Gao1, Randy Bush4 1University of Massachusetts, Amherst 2University of Michigan 3AT&T Labs-Research 4Internet Initiative Japan Presentation modified with permission Presenter: Young-Rae Kim Date: Feb. 24, 2009
Table of Contents • Background • Motivation • Open Question • Our Work • Methodology • How Routing Failure Occur • Summary • Conclusion • R-BGP • Appendix
Background : Border Gate Protocol(BGP) • The Border Gateway Protocol(BGP) is the core routing protocol of the Internet. It maintains a table of IP networks or ‘prefixes’ which designate network reachability among autonomous systems(AS). • Most Internet users do not use BGP directly. However, most ISP must use BGP to establish routing between one another.
Background : Border Gateway Protocol (BGP) Beacons • BGP Beacons are for research purposes to improve our understanding of BGP dynamics. • A BGP Beacon is an unused prefix which has a well-defined schedule for announcement and withdrawal. • Given the known schedule of announcements and withdrawals, we can study the dynamics of BGP using publicly available BGP update data.
Background : MRAI timer • MRAI (Minimum Route Advertisement Interval) timer is specified in BGP. This timer acts to rate-limit updates, on a per-destination basis. • BGP(BGP-4) suggests values of 30s and 5s for this interval for external BGP(eBGP) and internal BGP(iBGP) respectively. • The MRAI serves to suppress messages which BGP would otherwise send out to describe transitory states, and so allow BGP to converge with significantly fewer messages sent.
Background : Internet Control Message Protocol (ICMP) • Chiefly used by networked computers’ OS to send error messages (i.e. indicating that a requested service is not available or that host or router could not be reached.) • It differs in purpose from TCP/UDP in that it is typically not used to send and receive data between end systems. • ICMP can be used directly by user using ping and trace routes.
Motivation • Real-time services have made high availability of end-to-end Internet paths of paramount importance. • low packet loss rate, low delay, high network availability, and fast reaction time • Internet path failures are widespread [Labovitz:98, Markopoulou:04,Feamster:03]. • can last as long as 10 minutes • Degraded end-to-end path performance is correlated with routing dynamics.
Open Questions • How routing changes result in degraded end-to-end path performance? • What kinds of routing dynamics cause the degraded end-to-end performance? • How factors such as topological properties, or routing policies affect performance degradation?
Our Work • Study end-to-end performance under realistic topologies. • Investigate several metrics to characterize the end-to-end loss, delay, and out-of-order packets. • Characterize the kinds of routing changes that impact end-to-end path performance. • Analyze the impact of topology, routing policies, MRAI timer and iBGP configurations on end-to-end path performance.
Methodology • A multi-homed prefix • BGP Beacon prefix: 192.83.230.0/24 • Controlled Routing Changes • Failover events: Beacon changes from the state of having both providers to the state of having only a single provider. • Recovery events: Beacon changes from the state of having a single provider for connectivity to the state of having both providers. Provider 1 Provider 2 Provider 1 Provider 2 Provider 1 Provider 2 Failover event Recovery event Beacon Beacon Beacon
Active Probing • From 37 PlanetLab hosts to the Beacon host (a host within the Beacon prefix) • Back-to-back traceroutes • Back-to-back pings • UDP probing (50msec interval) • Data plane performance metrics host B host A Internet host C Provider 1 Provider 2 Beacon host
Packet Loss • Loss burst: consecutive UDP probing packets lost during a routing change event. Failover Recovery
Correlating Packet Loss with Routing Failures • ICMP replies • temporary loss of reachability (!N or !H) • forwarding loops (exceeded TTL) • Routing failures • temporary loss of reachability and transient routing loops • Correlate loss bursts with ICMP messages • time window [-1 sec, 1 sec] • Underestimate the number of loss bursts due to routing failures • missing ICMP packets.
An Example planet02.csc.ncsu.edu experiences packet loss on July 30, 2005
Loss Bursts due to Routing Failures • Failover events: 76% packets lost • Recovery events: 26% packets lost Failover Recovery
How Routing Failures Occur (Failover)? Prefer-customer routing policy: routes received from a provider’s customers are always preferred over those received from its peers. Provider 1 Provider 2 Peer link 0 R2 R3 R4 R5 0 0 2 0 0 1 0 R1 R6 0 0 Customer link Beacon AS 0
How Routing Failures Occur (Failover)? (contd.) No-valley routing policy: peers do not transit traffic from one peer to another. 1 0 2 0 1 0 R8 R7 R9 2 0 1 0 Provider 3 Peer link R2 R3 R4 R5 Peer link 0 0 0 2 0 0 1 0 R1 R6 0 0 Provider 2 Provider 1 Beacon AS 0
How Routing Failures Occur? (Recovery) iBGP constraint: a route received from an iBGP router cannot be transited to another iBGP router Provider 2 Withdraw (2 0) R1 R2 R4 Provider 1 1. Path 0 R3 recovery. 2. R3 sends the path to R2 path (0) Path (0) 3. R2 sends a withdrawal to R1 R3 4. R3 sends the recovery path to R1 0 5. R1 regains its connection to the Beacon Beacon AS 0
Summary • During failover and recovery events • Routing changes impact packet loss significantly. • Multiple loss bursts are observed in 60% of events. • Routing changes can lead to long packet round-trip delays and reordering. • Loss bursts explained by routing failures last longer than those unidentified ones. • Loss bursts caused by forwarding loops last longer than those caused by loop-free routing failures.
Conclusions • During failover and recovery events • routing failures contribute to end-to-end packet loss significantly. • Routing policies, iBGP configuration and MRAI timer values play a major role in causing packet loss during routing events. • Degraded end-to-end performance can be experienced by a diverse set of hosts when there is a routing change. • Accommodate routing redundancy may eliminate majority of identified path failures.
The End Thanks!
Location of Lost Bursts (Failover events) • Location of the first lost bursts caused by routing failures. • From ISP 2’s BGP updates: • Routing failures do occur and are not visible from ICMP messages due to short duration. • From another AS’s BGP updates, and Oregon RouteView • Routing failures are cascaded to other ASes.
Location of Lost Bursts (Recovery events) • Location of the first lost bursts caused by routing failures. • BGP updates from ISP 2 • 12 withdrawals over 724 recovery events
Representativeness • Connectivity of Destination Prefixes • SS: Single-homed prefixes via a single upstream link • SM: Single-homed prefixes via multiple upstream links • MS: Multi-homed prefixes via a single upstream link • MM: Multi-homed prefixes via multiple upstream links • Routing tables from one tier-1 ISP on January 15, 2006
Representativeness (contd.) • Multi-homed destination prefixes Peer link ISP 2 ISP 3 ISP 1 Customer link Customer link destination
Representativeness (contd.) • Multi-homed destination prefixes with multi-upstream links ISP 2 ISP 1 ISP 2 ISP 1
Loss Burst Length • loss burst length can be as long as 480 packets for failover events, and 180 packets for recovery events Loss burst length Failover events Recovery events
Multiple Loss Bursts • Multiple loss bursts after the injection of a withdrawal message or an announcement. Failover Recovery
Methodology Evaluation • Our measurement is not significantly biased by ICMP blocking • The number of ICMP messages in the absence of routing change (0.6%). • ICMP messages from 68 ASes, and 53% of them belong to 10 tier-1 ASes. • 52% of ISP1’s routers, and 95% of ISP2’s routers generate ICMP messages.