1 / 32

A Measurement Study of Peer-to-Peer File Sharing Systems

A Measurement Study of Peer-to-Peer File Sharing Systems. Presented by Cristina Abad. Motivation. In a P2P file sharing system, peers are usually in the “edge” of the network Does this affect/limit the quality of the infrastructure?

Download Presentation

A Measurement Study of Peer-to-Peer File Sharing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad

  2. Motivation • In a P2P file sharing system, peers are usually in the “edge” of the network • Does this affect/limit the quality of the infrastructure? • What are the characteristics of hosts that choose to participate? • Solution: Measure Gnutella and Napster traffic to help understand these issues

  3. Napster

  4. Gnutella

  5. Methodology • Crawler periodically takes “snapshot” of Napster/Gnutella • capture basic info (peers, files shared, …) • For peers discovered • measure bottleneck bandwidth • measure latency • track content and degree of sharing • Measure lifetime • track availability of peers (at P2P and IP level)

  6. Crawling Napster • Peers can only be discovered by querying index • Crawler issues queries with names of popular song artists • Query responses contain • IP, reported bandwidth, files shared (number, names and sizes) • Results: • Captured 40-60% of Napster hosts (contributing to 80-95% of total files) • Could not capture peers that do not share files

  7. Crawling Gnutella • Crawler uses ping/pong to discover peers • Each crawl captured aprox. 10000 peers

  8. Measuring bandwidth • Reported bandwidth may not be accurate (ignorance or lies) • Use bottleneck bandwidth as approximation to available bandwidth • capacity of slowest host along path between two hosts • Used SProbe to actively measure both upstream and downstream bottleneck bandwidth • Similar to “packet pair” technique

  9. Packet Pair Technique • Two packets queued next to each other at bottleneck link exit the link t seconds apart: • Then, Kevin Lai and Mary Baker. “Measuringbandwidth”. In Proceedings of IEEE INFOCOM '99. 1999. s2: size of second packet bbnl: bottleneck bandwidth

  10. How many peers are server-like? 8% have upstream bb  10Mbps • High-bandwidth, low latency, high availability

  11. Availability – Host uptimes

  12. Availability – Session duration

  13. Free-riders

  14. Is Gnutella robust?

  15. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload Presented by Cristina Abad

  16. Three-tiered approach • Analyze 200-day trace of Kazaa traffic • Considered only traffic going from U. Washington to the outside • Develop a model of multimedia workloads • Analyze and confirm hypothesis • Explore potential impact of locality -awareness in Kazaa

  17. Contributions • Obtained some useful characterizations of Kazaa’s traffic • Showed that Kazaa’s workload is not Zipf • Showed that other workloads (multimedia) may not be Zipf either • Presented a model of P2P file-sharing workloads based on their trace results • Validated the model through simulations that yielded results very similar to those from traces • Proved the usefulness of exploiting locality-aware request routing

  18. Measurement results • Users are patient • Users slow down as they age • Kazaa is not one workload • Kazaa clients fetch objects at-most-once • Popularity of objects is often short-lived • Kazaa is not Zipf

  19. User characteristics (1) • Users are patient

  20. User characteristics (2) • Users slow down as they age • clients “die” • older clients ask for less each time they use system

  21. User characteristics (3) • Client activity • Tracing used could only detect users when their clients transfer data • Thus, they only report statistics on client activity, which is a lower bound on availability • Avg session lengths are typically small (median: 2.4 mins) • Many transactions fail • Periods of inactivity may occur during a request if client cannot find an available server with the object

  22. Object characteristics (1) • Kazaa is not one workload

  23. Object characteristics (2) • Kazaa object dynamics • Kazaa clients fetch objects at most once • Popularity of objects is often short-lived • Most popular objects tend to be recently born objects • Most requests are for old objects

  24. Object characteristics (3) • Kazaa is not Zipf • Web access patterns are Zipf: small number of objects are extremely popular, but there is a long tail of unpopular requests. • Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient) • (Zipf) looks linear on log-log scale

  25. Model of P2P file-sharing workloads • On average, a client requests 2 objects/day • P(x): probability that a user requests an object of popularity rank x  Zipf(1) • Adjusted so that objects are requested at most once • A(x): probability that a newly arrived object is inserted at popularity rank x  Zipf(1) • All objects are assumed to have same size • Use caching to observe performance changes (effectiveness  hit rate)

  26. Model – Simulation results • File-sharing effectiveness diminishes with client age • System evolves towards one with no locality and objects chosen at random from large space • New object arrivals improve performance • Arrivals replenish supply of popular objects • New clients cannot stabilize performance • Can’t compensate for increasing number of old clients • Overall bandwidth increases in proportion to population size

  27. Model validation • By tweaking the arrival rate of of new objects, were able to match trace results (with 5475 new arrivals per year)

  28. Exploring locality-awareness • Currently organizations shape or filter P2P traffic • Alternative strategy: exploit locality in file-sharing workload • Caching; or, • Use content available within organization to substantially decrease external bandwidth usage • Result: 86% of externally downloaded bytes could be avoided by using an organizational proxy

  29. Questions?

  30. Analysis • How can results obtained be used when evaluating P2P schemes? • Are any of the measurements obtained biased? • Peers are heterogeneous • Incentives • Enforcement (e.g. super-peers in Kazaa)

  31. SProbe • Works in uncooperative environments • Works on asymmetric network paths • Exploit properties of TCP protocol • Send SYN packet with large payload; then, measure time dispersion of received RST packet

  32. Zipf • Linguist George Kingsley Zipf observed that for many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n • "Zipf's law" is also sometimes used to refer to the corresponding probability distribution • Is an instance of a power law • Zipf's law is often demonstrated by plotting the data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.

More Related