1 / 19

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload. Presented For Cs294-4 Fall 2003 By Jon Hess. Measurement, Modeling, and Analysis. of a Peer-2-Peer File-Sharing Workload. Goal - Overview

Download Presentation

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess

  2. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Goal - Overview • Determine if the KaZaA search space is queried in such a way that a group of 25,000 clients can satisfy most of their own requests.

  3. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Goals - Details • Capture an extensive trace • Utilize that trace to understand file-sharing traffic flows • Model user and object activity • Determine inefficiencies in the distribution model • Propose solutions to inefficiencies

  4. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Motivations • Beginning in 1999-2000 file-sharing traffic began to exceed HTTP traffic in terms of aggregate bandwidth consumed • File-sharing traffic is much less understood than HTTP traffic even though it represents such a large segment of bandwidth usage • Bandwidth is expensive 2000 2002 HTTP Traffic P2P Traffic

  5. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • The Trace • 2 Machines • 203 days 5 hours and 6 minutes • 22.7TB of KaZaA file transfer traffic • Captured seasonal variations • End of spring • Summer • Fall semester

  6. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Trace Conclusions • Users are patient • 30 minutes to retrieve a small object • Up to 1 week to retrieve a large object • Users consume less as they age

  7. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Trace Conclusions • Users machines are not very active • A session is an unbroken length of time where a client has one or more file transfers in progress. • Average sessions are only 2 minutes • 90th percentile 28 minutes • Over the life of a client, it is only active 5.54% of the time or 0.20% of the trace period • 90th percentile clients are active most of their life, and 4.15% of the trace • Without control traffic analysis, is this meaningful?

  8. Transfer A Transfer B Transfer D Transfer E 3 Minutes 2 Minutes Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Session Lengths

  9. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Trace Conclusions – Objects • Most requests are for small objects – 91% • Most bytes transferred are part of large objects – 65% • There are many small objects • There are few large objects • Small Objects’ popularity is subject to heavy churn • No small object was in the top 10 for all 6 months • Only 1 large object lived in the top 10 for 6 months • 44 large files remained in the top 100 for 6 months • The most popular small objects are new objects • Most requests are for old objects

  10. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Fetch-at-most-once • Once a KaZaA user obtains an object, they will not need to retrieve another copy • 94% of Objects are fetched once per user • 99% are fetched less than twice per user • Stems from the fact that media files are immutable and never ‘stale’ • You may refresh ‘slashdot.org’ three times a day, but there is no point download ‘thriller.mpeg’ seventeen times. • This keeps KaZaA workload from following a Zipf curve even though object popularity does.

  11. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Workload Modeling • Create a set of objects and give them popularity based on a zipf distribution • Create a set of clients that requests objects in proportion to there popularity • Have each client ‘fetch-at-most-once’ • Measure the distribution of transfers • Does it follow a zipf curve • How many big-object requests can a population of size N satisfy

  12. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Popular objects are not requested as curve would predict

  13. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Would a proxy cache help? • At first the proxy will cache the popular objects and succeed. • But as ‘fetch-at-most-once’ draws clients away from the Zipf curve and the proxy begins to fail. • What happens if we increase density of popularity? • Curve starts higher and falls faster

  14. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Previous model did not insert new objects. • New popular objects tend to ‘correct’ the work load. • Through providing locality • New clients however do not help, they contribute to keeping old object’s popular and destroy locality

  15. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Validating The Model • Capture parameters that are inputs to the model from the trace • Number of clients • Number of objects • User request rate • Probability user requests given file - Guess • Probability of popularity of new objects - Guess • Object arrival rates – Guess • Run simulation with harvested parameters • See if simulation predicts what actually happened

  16. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Simulation seems to successfully predict reality. But with three free variables used to tune results, is this fair?

  17. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • What inefficiencies can we eliminate? • Analysis against the trace shows • 86% of object transfers were from external sources when an internal source possessed the object. • A traditional proxy, given the resources, could cut bandwidth utilization by 86% • Would have to host pirated data • Could use a proxy redirector instead. Must know the availability of the objects • Control traffic is obfuscated • Build locality into the protocol • Does this sacrifice anonymity?

  18. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • How successful would a locality aware protocol be? • Assume that a client is available for periods the trace shows it as active • During a file transfer - extremely conservative

  19. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Questions? Will increasing efficiency decrease load as the authors would like? Or simply increase work achieved per dollar? Do clients have insatiable appetites? Are you worried that a large number of queries might have already been locally satisfied?

More Related