190 likes | 351 Views
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload. Presented For Cs294-4 Fall 2003 By Jon Hess. Measurement, Modeling, and Analysis. of a Peer-2-Peer File-Sharing Workload. Goal - Overview
E N D
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Goal - Overview • Determine if the KaZaA search space is queried in such a way that a group of 25,000 clients can satisfy most of their own requests.
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Goals - Details • Capture an extensive trace • Utilize that trace to understand file-sharing traffic flows • Model user and object activity • Determine inefficiencies in the distribution model • Propose solutions to inefficiencies
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Motivations • Beginning in 1999-2000 file-sharing traffic began to exceed HTTP traffic in terms of aggregate bandwidth consumed • File-sharing traffic is much less understood than HTTP traffic even though it represents such a large segment of bandwidth usage • Bandwidth is expensive 2000 2002 HTTP Traffic P2P Traffic
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • The Trace • 2 Machines • 203 days 5 hours and 6 minutes • 22.7TB of KaZaA file transfer traffic • Captured seasonal variations • End of spring • Summer • Fall semester
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Trace Conclusions • Users are patient • 30 minutes to retrieve a small object • Up to 1 week to retrieve a large object • Users consume less as they age
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Trace Conclusions • Users machines are not very active • A session is an unbroken length of time where a client has one or more file transfers in progress. • Average sessions are only 2 minutes • 90th percentile 28 minutes • Over the life of a client, it is only active 5.54% of the time or 0.20% of the trace period • 90th percentile clients are active most of their life, and 4.15% of the trace • Without control traffic analysis, is this meaningful?
Transfer A Transfer B Transfer D Transfer E 3 Minutes 2 Minutes Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Session Lengths
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Trace Conclusions – Objects • Most requests are for small objects – 91% • Most bytes transferred are part of large objects – 65% • There are many small objects • There are few large objects • Small Objects’ popularity is subject to heavy churn • No small object was in the top 10 for all 6 months • Only 1 large object lived in the top 10 for 6 months • 44 large files remained in the top 100 for 6 months • The most popular small objects are new objects • Most requests are for old objects
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Fetch-at-most-once • Once a KaZaA user obtains an object, they will not need to retrieve another copy • 94% of Objects are fetched once per user • 99% are fetched less than twice per user • Stems from the fact that media files are immutable and never ‘stale’ • You may refresh ‘slashdot.org’ three times a day, but there is no point download ‘thriller.mpeg’ seventeen times. • This keeps KaZaA workload from following a Zipf curve even though object popularity does.
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Workload Modeling • Create a set of objects and give them popularity based on a zipf distribution • Create a set of clients that requests objects in proportion to there popularity • Have each client ‘fetch-at-most-once’ • Measure the distribution of transfers • Does it follow a zipf curve • How many big-object requests can a population of size N satisfy
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Popular objects are not requested as curve would predict
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Would a proxy cache help? • At first the proxy will cache the popular objects and succeed. • But as ‘fetch-at-most-once’ draws clients away from the Zipf curve and the proxy begins to fail. • What happens if we increase density of popularity? • Curve starts higher and falls faster
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Previous model did not insert new objects. • New popular objects tend to ‘correct’ the work load. • Through providing locality • New clients however do not help, they contribute to keeping old object’s popular and destroy locality
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Validating The Model • Capture parameters that are inputs to the model from the trace • Number of clients • Number of objects • User request rate • Probability user requests given file - Guess • Probability of popularity of new objects - Guess • Object arrival rates – Guess • Run simulation with harvested parameters • See if simulation predicts what actually happened
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Simulation seems to successfully predict reality. But with three free variables used to tune results, is this fair?
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • What inefficiencies can we eliminate? • Analysis against the trace shows • 86% of object transfers were from external sources when an internal source possessed the object. • A traditional proxy, given the resources, could cut bandwidth utilization by 86% • Would have to host pirated data • Could use a proxy redirector instead. Must know the availability of the objects • Control traffic is obfuscated • Build locality into the protocol • Does this sacrifice anonymity?
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • How successful would a locality aware protocol be? • Assume that a client is available for periods the trace shows it as active • During a file transfer - extremely conservative
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload • Questions? Will increasing efficiency decrease load as the authors would like? Or simply increase work achieved per dollar? Do clients have insatiable appetites? Are you worried that a large number of queries might have already been locally satisfied?