Characteristic Studies of Distributed Systems

Characteristic Studiesof Distributed Systems Maryam Rahmaniheris & Daniel Uhlig

Distributed System Implementations • Characteristics of systems in real world • How users and systems behave • P2P • Track clients and traffic on Kazaa • Explore and model usage • Cloud Computing • Experiment with Amazon Web Services • Measure performance metrics

Measurement, Modeling, and Analysis of Peer-to-Peer File-Sharing Work Load Peer To Peer Characteristics Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry M. Levy, And John Zahorjan Presented by Daniel Uhlig

Modeling of P2P Systems • Observe clients and users behavior in real world • Used large P2P system: Kazaa • Closed P2P protocal • Fasttrack network (still exists?) • Peaked at 2.3 million users in 2003 (more than Napster’s peak) • Now subscription DRM music store

Observing P2P • Recorded all Kazaa traffic at U. of Washington • 60,000 faculty, staff, and students • 203 day trace (late spring semester past the end of fall semester) • Protected privacy by making data anonymous • Analyze data, develop model • compared to web: Is there a better comparison?

Data Log • Recorded all Kazaa traffic • Incoming and outgoing data transfer and searches • HTTP traffic with username in header • KazaaLite showed up 60 days into trace • Username hardcoded so used IP to differentiated. • 20 terabytes of incoming data • 1.6 Million Requests • 25,000 users • 8.85 TB of unique objects • Paper used requests from university peers to external peers. • Will this bias results?

Kazaa Objects • Fetch at most once • 94% of Kazaa files vs 57% of web objects • 99% of Kazaa files fetch at most twice • Clients download objects just once • What is an object? • Authors assume immutable object that is unique • Same song • Still fetch at most once? • Different filename? • Different bitrate? • Different Length? • Encoded by different user?

Users are Patient • Web users want instant access, P2P users will wait • P2P users are patient • Small Objects • 30%  1 hr • 10% nearly a day • Large Objects • 50%  1 day • 20% wait 1 week • Is this accurate since client automatically restarts requests?

Users Slow Down • Took 30 day traces of ‘new’ users • Bytes requested decrease with age • Possible reasons? • Loss of interest • New P2P app • New ID

Users Slow Down Users leave the system forever Users request less data as they age Core clients have constant activity level But, less data request

Client Activity • How to measure activity: • Logged in vs downloading? • Average session length 2.4 minutes • Sessions could be split with a short term break in transfer • Activity over lifetime = 5.5% • Average transfer 17 minutes, but average session 2.4 minutes • Many transactions fail, looking for new host peer

Workload Large (>100 MB) vs Small (<10 MB) files Request Volume vs Transfer volume Audio clips vs Video clips

Object Dynamics Clients fetch objects at-most-once Popular objects quickly cycle New objects are most popular Most requests are for old objects How does the idea of distinct objects affect this?

Zipf Distribution where most popular objects are most fetched. Classical result for web pages

Non-Zipf • Authors propose that Kazaa traffic is NOT modeled by Zipf distribution. • P2P differences from web • ‘fetch-at-most once’ • Immutable object (cnn.com changes regularly, a multimedia file does not) • Simulated model of behaviors and observed

Non-Zipf model • Zipf seen as model in many places • Video on demand, video rental, movie tickets • Non-Zipf might better explain some of these models • Common characteristics • Birth of new objects • Fetch at most once • Immutable • Characteristics sometimes seen • Expensive to get object • Object remains • Does their non-zipf model explain everything?

Really Non-Zipf? • Multiple copies of the same object? • Does fetch at most once still hold • Requests for established files handled by internal users? (‘cached’) • Are objects immutable? • Changing names • Is new album/song a new object or an update from artist? • Non-Zipf in other multimedia • YouTube, Video Rental, DVD purchase, movie tickets?

Locality Awareness • Conserve University P2P bandwidth • 86% of objects requested already at U of W • Cache data (legal issues) • Redirector so request stay internal when possible • Few key nodes can save significant bandwidth

Discussion Points • What is an unique item • Does this affect the distribution of popular objects? • Are objects immutable • Apply ideas to other multimedia: • YouTube video popularity • Still fetch at most once • Non-Zipf for DVD rental or purchase • How to define a unique object • Should P2P handle large and small objects differently • Caching or other forced locality vs P2P built in locality

An Evaluation of Amazon's Grid Computing Services: EC2, S3, SQSSimon L. Garfinkel Presented by: Maryam Rahmaniheris University of Illinois at Urbana-Champaign CS525-Spring2009

Cluster Computing • Building your own cluster • Costly • Space • Cooling system • Staff… • Underutilization • Overutilization • Cloud computing • Computing as a utility • Time-multiplexing of resources • No need fro planning ahead • Elasticity • Amazon’s AWS • You only need a working credit card

Amazon AWS • EC2 (Elastic Compute Cloud) • Linux virtual machines for 10 cents per CPU hour • S3 (Simple Storage Service): • Data storage for 15 cents per gigabyte per month • SQS (Simple Queue Service): • Messaging service for 10 cents per thousands messages

An Example Application- GrepTheWeb

An Example Application

AWS Interface • Creating an Amazon AWS account • Signing up for individual services • Using REST API to start up virtual machines on EC2 and accessing data on S3 • HTTP commands • GET • PUT • DELETE

AWS Security • Accessing the account information • Stolen account password • Resetting the lost password through e-mail • Eavesdropping • Stolen e-mail password • Does not provide snapshots or backups • Multiple EC2 machines • Multiple S3 • Multiple AWS accounts • Does not guarantee the privacy of data on S3 • Encryption • Digital signatures

S3 Evaluation-Test Scenarios • Throughput and TPS • A bucket containing 15 objects, three each in sizes: • 1 byte • 1 Kbyte • 1 Mbyte • 16 Mbyte • 100 Mbyte • Different objects are used to minimize the effect of caching • Measuring end-to-end performance: • A series of successive probes • The delay between the probes follows Poisson distribution • Surge experiment • No delay between the queries • Between 1 and 6 threads executing at any moment • A total of 137,099 probes in 2007 • Single EC2 instance to S3 (32,269) • Distributed test bed (39,269) • Surge experiment (74,808)

Average Daily Read Throughput • A minor change in the network topology made by Amazon • Introducing additional delay b/w EC2 and S3 • Small TCP window size

CDF of Read and Write Throughput March 22-April8

Results • S3 performs better with larger transaction sizes • High per-transaction overhead • 40% of 1-byte writes are slower than 10 TPS as opposed to 5% of 1-byte reads • Writes must be committed to at least 2 different clusters • Amazon reliability guarantee • The median of 1-mbyte write bandwidth is roughly 5 times faster • Write transactions are being acknowledged when the data is written to cache • when the transaction raises to the cache size the difference disappears

Query Variance High Correlation Low Correlation

Results • Lower correlation for the 1 byte transactions • Sending a second request rather than waiting • Issuing two simultaneous requests for time-critical issues • Higher correlation for the 100 Mbyte transactions • Simultaneous requests, once they start providing data, are likely to take similar amount of time

Concurrent Performance :Improving S3 performance by issuing concurrent request to the same bucket Performance of 100MB GETs from S3 for one thread and combined threads Surge experiment Two VMs on two EC2 clusters Executing 1 thread for 10 min and then 2 for 10 min and … The experiment was run for 11 hours The 6 threads have three times the aggregate BW of 1 thread The Amount of BW received by each thread is cut in half

Other Experiments • Availability • From 107,556 non-surge tests consisting of multiple read and write probes: • 6 write retries and 3 write errors • 4 read retries • 100% availability with proper retry mechanism • Throughput • From the analysis of 19,630 1 Mbyte and greater transactions: • No write probes with a throughput less than 10 KB/s • 6 write probes with a throughput less than 100 KB/s

Experience with EC2 and SQS • EC2 instances • Fast • Responsive • Reliable • 1 unscheduled reboot, no lost data • 1 instance freeze, lost data • SQS • Simple API • Insert • One message at a time • 4 messages per second • Remove • In batches of 256: 5 messages per second • One message at a time: 2.5 messages per second

Conclusion • EC2 provides ready-to-go VMs at a reasonable cost • S3 delivers a high performance only for transactions of size 16Mb or larger • High per-transaction overhead • S3 delivers a much higher performance to EC2 than to other locations on the Internet • Limited SQS throughput • 4-5 transactions per second per thread • High availability • Security risks

Discussion Points • High correlation for large object transactions • Load balancer • Larger # of replicas • Scheduling policy • More noticeable variance in smaller scales • Why performance of SQS isn’t sufficient for scheduling tasks faster than seconds or slower than several hours? • Google AppEngine or Amazon EC2 • What are the best candidate applications for each of them? • What are the advantages of Amazon AWS over shared resources in Grids? • What are the disadvantages over dedicated clusters? • What will the funders of research projects feel about providing fund to pay Amazon bills instead of building dedicated clusters?

Characteristic Studies of Distributed Systems

Characteristic Studies of Distributed Systems

Presentation Transcript

Distributed Systems

Distributed Systems

Distributed Systems

Introduction of Distributed Systems

Distributed Systems

Distributed Systems

Federated Distributed Systems: Concepts of Distributed Systems (1)

Hardness studies for distributed systems

General characteristic of the dispersed systems

A Brief Report on Characteristic Studies of Micromegas

CHARACTERIZATION OF DISTRIBUTED SYSTEMS

Distributed file systems, Case studies

Performance of Distributed Systems

Distributed Systems

Distributed Systems

Distributed Systems Course Distributed Multimedia Systems

Distributed Systems Course Distributed File Systems

Conformance of Distributed Systems

Characterization of Distributed Systems

Classification of Distributed Systems Properties of Distributed Systems

Distributed Systems Course Distributed File Systems

Distributed Systems