1 / 24

An Analysis of Internet Content Delivery Systems

An Analysis of Internet Content Delivery Systems. 19 rd November, 2007 Youngsub Kwon @ CSE, SNU. Contents. Introduction Overview of Content Delivery Systems Methodology High-Level Data Characteristics Detailed Content Delivery Characteristics

xandy
Download Presentation

An Analysis of Internet Content Delivery Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Analysis of Internet Content Delivery Systems 19rd November, 2007 Youngsub Kwon @ CSE, SNU

  2. Contents • Introduction • Overview of Content Delivery Systems • Methodology • High-Level Data Characteristics • Detailed Content Delivery Characteristics • The Potential Role of Caching in CDNs and P2P • Conclusion

  3. Introduction • This paper examines content delivery from the point of view of four content delivery systems • HTTP web traffic • Akamai content delivery network • Kazaa and Gnutella P2P file sharing traffic • Results • Quantify the rapidly increasing importance of new content delivery systems, particularly peer-to-peer networks • Characterize the behavior of these systems from the perspectives of clients, objects, and servers • Derive implications for caching in these systems

  4. Overview of Content Delivery Systems • WWW • Using the HTTP protocol (Consistency management) • Simple architecture (Server/Client) • most web objects are small(5~10KB) • Objects are accessed with Zipf popularity distribution • The number of web objects is enormous and rapidly growing

  5. Overview of Content Delivery Systems • Content Delivery Networks (CDNS) • Collections of servers located strategically across the wide-area Internet • Content is replicated across the wide area. High availability • CDN have server in ISP points of presence • Clients can access topologically nearby replicas with low latency • CDNs reduce average downloaded response times, but DNS redirection causes overhead • Peer-to-Peer Systems (P2P) • Peers collaborate to form a distributed system for the purpose of exchanging content • Most content-serving hosts are run by end-user • Low availability, low capacity network connections

  6. Methodology • Use passive network monitoring to collect traces of traffic • Network Composition • UW(=University of Washington) connects to its ISPs via two border routers - inbound, outbound traffic • Two routers are Fully connected to four switches • Each switches has a monitoring port that is used to copies packets to monitoring host • Tracing Infrastructure • Software - 26,000 lines of codes • Hardware - dual-processor Dell Precision Workstation 530 with 2.0Ghz Pentium III Xeon CPUs FreeBSD 4.5

  7. Methodology • Distinguishing Traffic Types • Two types of traffic - HTTP traffic, non-HTTP traffic • HTTP Traffic - WWW, Akamai, Kazaa, Gnutella • Non-HTTP Traffic - Kazaa, Gnutella search traffic • Akamai – Port 80, 8080, 443 that is server by Akamai server • WWW - Port 80, 8080, 443 that is not server by Akamai server • Gnutella – Ports 6346 or 6347 – includes file transfer, but excludes Search and control traffic • Kazaa – Port 1214 – includes file transfer, but excludes Search and control traffic

  8. High-Level Data Characteristics • TCP Bandwidth • All systems show a typical diurnal cycle • Akamai - 0.2% • Gnutella - 6.04% • WWW traffic - 14.3% of TCP traffic • Kazaa - 36.99% of TCP bytes

  9. High-Level Data Characteristics • UW Client and server TCP bandwidth • Figure (a) – Inbound Data BWs • WWW peaking in the middle of the day • Kazza peaking late at night • Figure (b) – Outbound Data BWs • Peak Kazza BW dominates WWW by a factor of 3

  10. High-Level Data Characteristics • Content types downloaded by UW clients • GIF & JPEG images account for 42% of downloads, account for only 16.3% of the bytes transferred • Compares with measurements from 1999 study • HTML traffic : -43%, GIF&JPG traffic : -59% • AVI&MPG traffic : 400%, MP3 traffic 300%

  11. High-Level Data Characteristics • Summary • The balance of HTTP traffic has changed dramatically over the last server years • P2P traffic overtaking WWW traffic as the largest contributor to HTTP bytes transferred • Although UW is large publisher of web documents, P2P traffic makes the University an even larger exporter of data • The mixture of object types downloaded by UW clients has changed

  12. Detailed Content Delivery Characteristics • Objects • Object size: P2P > WWW & Akamai • Top bandwidth consuming Objects • For Gnutella, we see that a relatively large number of objects account for a large portion of the transferred bytes

  13. Detailed Content Delivery Characteristics • Objects – Top 10 bandwidth consuming objects • WWW – The top 10 objects are a mix of extremely small objects • Akamai – 8 out of the top 10 objects are larger and unpopular • Kazaa – Export objects are larger than import objects

  14. Detailed Content Delivery Characteristics • Objects – Downloaded bytes by object type

  15. Detailed Content Delivery Characteristics • Clients - Top UW bandwidth consuming clients • Figure (a) – Top Bandwidth Consuming UW Clients • WWW - Top 200 clients (0.5%)  13% of WWW trafficKazza - Top 200 clients (4%)  50% of Kazza traffic • Figure (b) – Top Bandwidth Consuming UW Servers • Kazza: 200 clients  20% of the total HTTP bytes downloaded (worst offender)

  16. Detailed Content Delivery Characteristics • Clients - Request rates over time

  17. Detailed Content Delivery Characteristics • Servers-Top UW-internal bandwidth producing servers • Figure (a) – Top Bandwidth Consuming UW Servers • Gnutella: All of the the bytes  first 10 servers, WWW: steep curveKazza: 80% of the bytes  top 334 servers • Figure (b) • WWW: 20 servers  20% of all HTTP bytes output Kazza: 170 server  50% of all HTTP bytes output

  18. Detailed Content Delivery Characteristics • Servers-The UW-external bandwidth producing servers • Figure (a) • WWW: 938 external servers  50% of the bytesKazza: 600 external servers  26% of the bytes • Figure (b) • Kazza: Top 500 external Kazza peers  10% of the bytesWWW: Top 500 servers  22% of the bytes

  19. Detailed Content Delivery Characteristics • Servers • The response codes returned by external servers in each content delivery system • Figure (a) • Akamai and the WWW: 70% success, P2P: Less than 20% success • Figure (b) shows that nearly all HTTP bytes are for useful content. • Overhead of rejected requests is small compared to the amount of useful data transferred.

  20. Detailed Content Delivery Characteristics • Scalability of P2P Systems • Whether P2P Systems like Kazaa can scale in environments such as the univ. ? • Every peer in P2P system consumes bandwidth in both directions • Each new P2P client added becomes a server for the entire P2P structure • Kazaa object is huge, so a small number of peers can consume an enormous amount of total net. Bandwidth • The bandwidth cost of each P2P peer is 90 times that of the web client ! • It seems questionable whether any organization can supports a service with these characteristics

  21. Detailed Content Delivery Characteristics • Summary • Peer-to-peer, which now accounts for over three quarters of HTTP traffic • A small number of P2P users are consuming a disproportionately high fraction of bandwidth • While the P2P request rate is quite low, the transfer last long • While the design of P2P overlay structures focuses on spreading the workload for scalability, our measurements show that a small number of servers are taking the majority of the burden

  22. The Potential Role of Caching in CDNs • Akamai requests achieve an 88% ideal hit rate and a 50% practical hit rate, noticeably higher than www requests (77% and 36%) • Our analysis shows that akamai requests are more skewed towards the most popular documents than are WWW requests • We know that most bytes fetched from Akamai are from images and videos • This implies that much of Akamai's content is in fact static and could be cached • We would expect that widely deployed proxy caches would significantly reduce the need for a separate content delivery network

  23. The Potential Role of Caching in P2P • The potential impact of caching in P2P systems may exceed the benefits seen in the web • Inbound cache byte hit rate = 35%, Outbound cache byte hit rate = 85% • Hit rate increases with client population size for outbound traffic. (1000 client - 40%, 500,000 client - 85%) • Reverse P2P cache saves the most bandwidth

  24. Conclusion • P2P traffic now accounts for the majority of HTTP bytes transferred • P2P documents are three orders of magnitude larger than web objects • A small number of extremely large objects account for an enormous fraction of observed P2P traffic • A small number of clients and servers are responsible for the majority of the traffic we saw in the P2P systems • Each P2P client creates a significant bandwidth load in both directions

More Related