240 likes | 457 Views
An Analysis of Internet Content Delivery Systems. 19 rd November, 2007 Youngsub Kwon @ CSE, SNU. Contents. Introduction Overview of Content Delivery Systems Methodology High-Level Data Characteristics Detailed Content Delivery Characteristics
E N D
An Analysis of Internet Content Delivery Systems 19rd November, 2007 Youngsub Kwon @ CSE, SNU
Contents • Introduction • Overview of Content Delivery Systems • Methodology • High-Level Data Characteristics • Detailed Content Delivery Characteristics • The Potential Role of Caching in CDNs and P2P • Conclusion
Introduction • This paper examines content delivery from the point of view of four content delivery systems • HTTP web traffic • Akamai content delivery network • Kazaa and Gnutella P2P file sharing traffic • Results • Quantify the rapidly increasing importance of new content delivery systems, particularly peer-to-peer networks • Characterize the behavior of these systems from the perspectives of clients, objects, and servers • Derive implications for caching in these systems
Overview of Content Delivery Systems • WWW • Using the HTTP protocol (Consistency management) • Simple architecture (Server/Client) • most web objects are small(5~10KB) • Objects are accessed with Zipf popularity distribution • The number of web objects is enormous and rapidly growing
Overview of Content Delivery Systems • Content Delivery Networks (CDNS) • Collections of servers located strategically across the wide-area Internet • Content is replicated across the wide area. High availability • CDN have server in ISP points of presence • Clients can access topologically nearby replicas with low latency • CDNs reduce average downloaded response times, but DNS redirection causes overhead • Peer-to-Peer Systems (P2P) • Peers collaborate to form a distributed system for the purpose of exchanging content • Most content-serving hosts are run by end-user • Low availability, low capacity network connections
Methodology • Use passive network monitoring to collect traces of traffic • Network Composition • UW(=University of Washington) connects to its ISPs via two border routers - inbound, outbound traffic • Two routers are Fully connected to four switches • Each switches has a monitoring port that is used to copies packets to monitoring host • Tracing Infrastructure • Software - 26,000 lines of codes • Hardware - dual-processor Dell Precision Workstation 530 with 2.0Ghz Pentium III Xeon CPUs FreeBSD 4.5
Methodology • Distinguishing Traffic Types • Two types of traffic - HTTP traffic, non-HTTP traffic • HTTP Traffic - WWW, Akamai, Kazaa, Gnutella • Non-HTTP Traffic - Kazaa, Gnutella search traffic • Akamai – Port 80, 8080, 443 that is server by Akamai server • WWW - Port 80, 8080, 443 that is not server by Akamai server • Gnutella – Ports 6346 or 6347 – includes file transfer, but excludes Search and control traffic • Kazaa – Port 1214 – includes file transfer, but excludes Search and control traffic
High-Level Data Characteristics • TCP Bandwidth • All systems show a typical diurnal cycle • Akamai - 0.2% • Gnutella - 6.04% • WWW traffic - 14.3% of TCP traffic • Kazaa - 36.99% of TCP bytes
High-Level Data Characteristics • UW Client and server TCP bandwidth • Figure (a) – Inbound Data BWs • WWW peaking in the middle of the day • Kazza peaking late at night • Figure (b) – Outbound Data BWs • Peak Kazza BW dominates WWW by a factor of 3
High-Level Data Characteristics • Content types downloaded by UW clients • GIF & JPEG images account for 42% of downloads, account for only 16.3% of the bytes transferred • Compares with measurements from 1999 study • HTML traffic : -43%, GIF&JPG traffic : -59% • AVI&MPG traffic : 400%, MP3 traffic 300%
High-Level Data Characteristics • Summary • The balance of HTTP traffic has changed dramatically over the last server years • P2P traffic overtaking WWW traffic as the largest contributor to HTTP bytes transferred • Although UW is large publisher of web documents, P2P traffic makes the University an even larger exporter of data • The mixture of object types downloaded by UW clients has changed
Detailed Content Delivery Characteristics • Objects • Object size: P2P > WWW & Akamai • Top bandwidth consuming Objects • For Gnutella, we see that a relatively large number of objects account for a large portion of the transferred bytes
Detailed Content Delivery Characteristics • Objects – Top 10 bandwidth consuming objects • WWW – The top 10 objects are a mix of extremely small objects • Akamai – 8 out of the top 10 objects are larger and unpopular • Kazaa – Export objects are larger than import objects
Detailed Content Delivery Characteristics • Objects – Downloaded bytes by object type
Detailed Content Delivery Characteristics • Clients - Top UW bandwidth consuming clients • Figure (a) – Top Bandwidth Consuming UW Clients • WWW - Top 200 clients (0.5%) 13% of WWW trafficKazza - Top 200 clients (4%) 50% of Kazza traffic • Figure (b) – Top Bandwidth Consuming UW Servers • Kazza: 200 clients 20% of the total HTTP bytes downloaded (worst offender)
Detailed Content Delivery Characteristics • Clients - Request rates over time
Detailed Content Delivery Characteristics • Servers-Top UW-internal bandwidth producing servers • Figure (a) – Top Bandwidth Consuming UW Servers • Gnutella: All of the the bytes first 10 servers, WWW: steep curveKazza: 80% of the bytes top 334 servers • Figure (b) • WWW: 20 servers 20% of all HTTP bytes output Kazza: 170 server 50% of all HTTP bytes output
Detailed Content Delivery Characteristics • Servers-The UW-external bandwidth producing servers • Figure (a) • WWW: 938 external servers 50% of the bytesKazza: 600 external servers 26% of the bytes • Figure (b) • Kazza: Top 500 external Kazza peers 10% of the bytesWWW: Top 500 servers 22% of the bytes
Detailed Content Delivery Characteristics • Servers • The response codes returned by external servers in each content delivery system • Figure (a) • Akamai and the WWW: 70% success, P2P: Less than 20% success • Figure (b) shows that nearly all HTTP bytes are for useful content. • Overhead of rejected requests is small compared to the amount of useful data transferred.
Detailed Content Delivery Characteristics • Scalability of P2P Systems • Whether P2P Systems like Kazaa can scale in environments such as the univ. ? • Every peer in P2P system consumes bandwidth in both directions • Each new P2P client added becomes a server for the entire P2P structure • Kazaa object is huge, so a small number of peers can consume an enormous amount of total net. Bandwidth • The bandwidth cost of each P2P peer is 90 times that of the web client ! • It seems questionable whether any organization can supports a service with these characteristics
Detailed Content Delivery Characteristics • Summary • Peer-to-peer, which now accounts for over three quarters of HTTP traffic • A small number of P2P users are consuming a disproportionately high fraction of bandwidth • While the P2P request rate is quite low, the transfer last long • While the design of P2P overlay structures focuses on spreading the workload for scalability, our measurements show that a small number of servers are taking the majority of the burden
The Potential Role of Caching in CDNs • Akamai requests achieve an 88% ideal hit rate and a 50% practical hit rate, noticeably higher than www requests (77% and 36%) • Our analysis shows that akamai requests are more skewed towards the most popular documents than are WWW requests • We know that most bytes fetched from Akamai are from images and videos • This implies that much of Akamai's content is in fact static and could be cached • We would expect that widely deployed proxy caches would significantly reduce the need for a separate content delivery network
The Potential Role of Caching in P2P • The potential impact of caching in P2P systems may exceed the benefits seen in the web • Inbound cache byte hit rate = 35%, Outbound cache byte hit rate = 85% • Hit rate increases with client population size for outbound traffic. (1000 client - 40%, 500,000 client - 85%) • Reverse P2P cache saves the most bandwidth
Conclusion • P2P traffic now accounts for the majority of HTTP bytes transferred • P2P documents are three orders of magnitude larger than web objects • A small number of extremely large objects account for an enormous fraction of observed P2P traffic • A small number of clients and servers are responsible for the majority of the traffic we saw in the P2P systems • Each P2P client creates a significant bandwidth load in both directions