Moving beyond end-to-end path information to optimize CDN performance

Moving beyond end-to-end path information to optimize CDN performance Krishnan, R., et al. in processed IMC '09. 2009. New York, NY, USA: ACM. Reported by: Eraser Huang Email: eraser.osiris@gmail.com 2011-03-23 @SYSU

Agenda • Problem • Abstract • Overview • Path Latency Analysis • Diagnose Cases of Inefficient Routing • Some Example • Limitation

Problem Client-Server Applications

Problem Content Distribution Network（CDN）

Problem Content Distribution Networks（CDN）

Abstract • Main result of this paper is that: • Redirecting every client to the server with least latency does not suffice to optimize client latencies • Find that queuing delays often override the benefits of a client interacting with a nearby server The dataset analyzed in this paper is available at： http://research.google.com/pubs/pub35590.html

Overview • Google’s CDN Architecture • Aims to redirect each client to the node to which it has the least latency • The RTT measured to a client is taken to be representative of the client’s prefix • This redirection however is based on the prefix corresponding to the IP address of the DNS server that resolves the URL of the content on the client’s behalf

Overview • Goals • Understand the efficacy of latency-based redirection in enabling a CDN to deliver the best RTTs possible to its clients • Identify the broad categories of causes for poor RTTs experienced by clients • Implement a system to detect instances of poor RTTs and diagnose the root causes underlying them

Overview • The authors have used WhyHighto diagnose several instances of inflated latencies • BGP tables from routers • Mapping of routers to geographic locations • RTT logs for connections from clients • Traffic volume information • Active probes such as traceroutesand pings when necessary • Approximately 170K prefixes spread across the world The dataset analyzed in this paper is available at： http://research.google.com/pubs/pub35590.html

Overview • Data Set - RTT Measurement The RTT will be measured

Overview • Data Set - Data Pre-Processing

Path Latency Analysis • Distribution of RTTs Figure 2

Path Latency Analysis • Three Main Components of TCP Layer RTT • Transmission delay (time to put a packet on to the wire) • The size of typically control packets is 50 bytes, the transmission delay will be less than 1ms on dialup link • Propagation delay (time spent from one end of the wire to the other end) • The client is far away from the node to which they have the lowest latency • Queuing delay (time spent by a packet waiting to be forwarded)

Path Latency Analysis • Effectiveness of Client Redirection Figure 3

Path Latency Analysis • Characterizing Latency Inflation More than 20% Figure 4

Path Latency Analysis • Data Set Partition 1) Prefixes closest to the node geographically 2) All other prefixes Figure 3

Path Latency Analysis • Characterizing Latency Inflation (after data set partition) More than 20% Figure 5

Path Latency Analysis • Characterizing Delays More than 40% Figure 4

Path Latency Analysis • Change of Route (Inefficient Routing)

Path Latency Analysis • Characterizing Queuing Delays Figure 7

Path Latency Analysis • Summary • Redirection based on end-to-end RTTs results in most clients being served from a geographically nearby node; • A significant fraction of prefixes have inefficient routes to their nearby nodes; • Clients in most prefixes incur significant latency overheads due to queuingof packets.

Diagnose Casesof Inefficient Routing • Identifying Inflated Prefixes • Compare the minimum RTT measured at the node across all connections to the prefix with the minimum RTT measured at the same node across all connections to clients within the prefix’s region • Declare a prefix to be inflated if that difference is greater than 50ms.

Diagnose Casesof Inefficient Routing • Identifying Causes of Latency Inflation • Snapshots of the BGP routing table provide information on the AS path being used to route packets to all prefixes • A log of all the BGP updates tells us the other alternative paths available to each prefix • Atraceroute1 from the node to a destination in the prefix, and pings to intermediate routers seen on the traceroute will gain visibility into the reverse path back from prefixes

Diagnose Casesof Inefficient Routing • Identifying Causes of Latency Inflation • Circuitousness along the forward path • Sequence of locations traversed along the traceroute to the prefix • Circuitousness along the reverse path • Significant RTT increase on a single hop of the traceroute • Return TTL on the response from a probed interface • Flow records gathered at border routers in Google’s network

Diagnose Casesof Inefficient Routing • Helping Administrator to Troubleshooting • Identifying Path Inflation Granularity • (i) Prefixes sharing the same PoP-level path measured by traceroute, • (ii) Prefixes sharing the same AS path and the same exit and entry PoPsout of and into Google’s network, (iii) prefixes sharing the same AS path • (iv) Prefixes belonging to the same AS • Ranking CDN Nodes • The fraction of nearby prefixes that have inflated latencies • The fraction of nearby prefixes that are served elsewhere

Diagnose Casesof Inefficient Routing • Ranking of 13 CDN Nodes

System Architecture of WhyHigh • Steps Involved in the WhyHighPipeline

Diagnose Casesof Inefficient Routing • Identifying Root Causes of Inflation • Lack of peering • Limited bandwidth capacity • Routing misconfiguration • Traffic engineering

Some Example • Illustrative Cases • Case 2: No peering, and shorter path on less specific prefix A node in India measured RTTs above 400ms to prefixes in IndSP1 Data used in troubleshooting Case 2: Extract of traceroute, and (b) AS paths received by Google. - Traffic engineering

Some Example • Illustrative Cases • Case 3: Peering, but inflated reverse path A node in Japan measured RTTs above 100ms to prefixes in IndSP1 Data used in troubleshooting Case 4: (a) Extract of traceroute, and (b) Pings to routers at peering link.

Some Example • Summarizing use of WhyHigh WhyHigh’s classification of inflated paths

Limitation • Traceroutes yield path information only at the IP routing layer • However, path inflation could occur below layer 3, e.g., in MPLS tunnels; • may not be explainable by the geographic locations of traceroutehops • Only has access to RTT data • TCP transfer times of medium to large objects could be inflated by other factors such as loss rate and bandwidth

Thank You!

Moving beyond end-to-end path information to optimize CDN performance

Moving beyond end-to-end path information to optimize CDN performance

Presentation Transcript

Maximizing End-to-End Network Performance

Internet2 End-to-End Performance Initiative

Maximizing End-to-End Network Performance

End to End Performance Initiative

End-to-End Performance: The Network View

End-to-End Performance Initiative

e-VLBI and End-to-End Performance

End-to-End performance tuning

End-to-End Performance with Traffic Aggregation

End-to-end Performance over Research Networks

SIP End-to-End Performance Metrics

End to End Performance Initiative

Maximizing End-to-End Network Performance

End to end Internet Performance today

Swarm End-to-End Mission Performance Simulator

End To End Performance Based Hiring

Maximizing End-to-End Network Performance

Internet2 End-to-End Performance Initiative: piPEs

End-to-end Performance over Research Networks

Internet2 End-to-End Performance Initiative

End-to-End Performance Initiative