230 likes | 363 Views
Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload. K.P. Gummadi, R. J. Dunn, et al ACM SOSP’03 Presented by Min Choi(mchoi@camars.kaist.ac.kr). Outline. Trace methodology and analysis User characteristics Client activities Object dynamics
E N D
Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload K.P. Gummadi, R. J. Dunn, et al ACM SOSP’03 Presented by Min Choi(mchoi@camars.kaist.ac.kr)
Outline • Trace methodology and analysis • User characteristics • Client activities • Object dynamics • Analyze why Kazaa workload is not Zipf • A model of P2P file-sharing workloads • A study of bandwidth-saving techniques • Conclusion Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Trace Methodology • Passively collect Kazaa traffic at the border of campus network and internet • Query traffic was not captured b/c of encryption. File transfers are HTTP transfers w/ Kazaa-specific header • Summary statistics of the trace: Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Kazaa Users Are Patient • Transfer time: the difference between the start time and the end time of a request • Small objects: <10MB (mostly audio files) • Large objects: >100MB (typically video files) Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
User Slow Down As They Age • Do people become hungrier for content as they gain experience with Kazaa? • Older clients requested fewer bytes b/c: • Attrition: population declines as clients age • Slowing down: older clients ask for less Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Client Activity • It’s difficult to quantify the availability of clients in a p2p system • Client activity includes: • Activity fraction: time spent in transfers / duration of lifetime. Lower bound on availability • Average session length: typical duration length Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Object Characteristics • Kazaa is not one workload • Kazaa is a blend of workloads of different properties • 3 ranges of objects: small (<10MB), medium (10MB~100GB), and large (>100GB) • Majority of requests are for smaller objects • Most bytes transferred are due to large objects Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Kazaa Object Dynamics • Multimedia objects are immutable, therefore affect object dynamics • Kazaa clients fetch objects at most once • Kazaa client requests an object once: 94% of time • Kazaa client requests an object twice: 99% of time • Most requests are for old (repeated) objects • An object is old if at least one month has passed since the first request of the object • 72% of requests for large objects are old • 52% of requests for small objects are old Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Kazaa Object Dynamics • The popularity of Kazaa objects is often short-lived • The most popular pages remains stable for the Web • Popularity is fleeting in Kazaa • Audio files lose popularity faster than popular video files • The most popular Kazaa objects tend to be recently born objects • Newly born objects: did not receive any requests during the first month of the trace Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Kazaa Is Not Zipf • Zipf’s law: • The popularity of ith-most popular object is proportional to i-α, α: Zipf coefficient • Kazaa is not Zipf • Most popular objects are less popular than Zipf would predict Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Why Kazaa Is Not Zipf • Fetch-repeatly vs. fetch-at-most-once • Simulate the two cases based on the same Zipf distribution • The result of fetch-at-most-once is similar to Kazaa. • Non-Zipf workloads are also observed in web proxy caches and VoD servers Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
A Model of P2P File-Sharing Workloads • Hypothesis: underlying popularity of objects in a fetch-at-most-once system is driven by Zipf’s law • A client requests 2 objects per day. Choose which object to fetch from Zipf(1) • An object is born with rate λo , its popularity rank is selected from Zipf(1) • Total object population cannot be observed from the trace. Use back-inference: given 18,000 distinct objects are requested in the trace, what’s the total number of objects? Ans: 40,000 Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Model Structure and Notation • Parameter value are chosen to reflect the measured data from the trace Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
File-Sharing Effectiveness • How should organization exepect bandwidth demand to change over time, given a shared proxy server? • Hit rate of the proxy cache decreases in the fetch-at-most-once case • Fetch-at-most-once clients consume the most popular objects early Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
New Object Arrivals Improve Hit Rate • Object updates in Web lower the hit rate • New objects arrivals are beneficial in P2P system • Arrivals of popular objects increase hit rate • If no arrivals, clients are forced to choose from the remaining unpopular objects Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
New Clients Cannot Stabilize Performance • The infusion of new clients at a constant rate cannot compensate for the increasing number of old clients • If we want to keep hit rate as a constant, we need exponential client arrival rate Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Model Validation • Underlying Zipf assumption cannot be validated directly. • Use the proposed model to replicate the object popularity distribution in the trace • Estimate various parameters • Arrival rate of new objects is chosen to fit the measured data. λo = 5,475 objects per year Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Exploring Locality-aware Request Routing • A significant fraction of Internet bandwidth is consumed by Kazaa • How would exploitation of locality help to save bandwidth? • Different ways to exploit locality: • A centralized proxy cache placed at organization border • Request redirection: favor organization-internal peers • Centralized request redirection • Decentralized request redirection Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
An Ideal Proxy Cache • Assume an ideal proxy: infinite capacity and bandwidth • 86% of external bandwidth would be saved • However, some may not want to store P2P file-sharing content in a proxy server due to legal issues Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Benefits of Locality-Awareness • Trace-based simulation • Infinite storage capacity • At most 12 concurrent downloads • Upload bandwidth 500 Kb/s • External bandwidth 100 Kb/s • Clients are available only when they’re transferring (a very conservative assumption) • Cold misses: objects cannot be found in peers • Busy misses: objects found but the peer is unavailable due to concurrent transfers Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Benefits of Locality-Awareness • Locality awareness obtained 68% byte hit rate for large objects and 37% byte hit rate for small objects • A substantial number of miss bytes (62% of large objects, 43% of small objects) are due to unavailable clients Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Benefits of Increased Availability • Most of bytes served and consumed come from highly available peers • Adding availability to the most available hosts earns a higher hit rate than adding to the least available host Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Conclusion • P2P file-sharing workloads are different to Web workloads • User are patient • Aged clients demand less • Fetch-at-most once • The proposed model suggests that client births and object births are the fundamental forces driving P2P workloads • There’s significant locality in the Kazaa workload • Locality-aware peers would save 63% external transfers even under conservation assumption Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload