( Re)Design Considerations for Scalable Large-File Content Distribution

(Re)Design Considerations forScalable Large-File Content Distribution Brian Biskeborn, Michael Golightly, KyoungSoo Park, and Vivek Pai Systems Lunch

Design meets realities • Challenges in deploying distributed systems • Real issues that are feedback for better design • Not about a novel idea • Performance debugging with CoBlitz • Peering strategy • Reducing load to the origin • Latency bottlenecks Systems Lunch

CoBlitz background • Scalable large file service • HTTP on top of conventional CDN • Cache by chunk rather than whole file • Transparent split/merge of chunks • http://coblitz.codeen.org:3125/your_url • Deployed on PlanetLab • 10 months of north American deployment • 10 months of world-wide deployment Systems Lunch

file1-2 file0-1 file0-1 file1-2 file 0-1 file 0-1 file2-3 file2-3 file 4-5 file 4-5 file3-4 file3-4 file4-5 file4-5 How it works Only reverse proxy(CDN) caches the chunks! file0-1 file1-2 CDN = Redirector + Reverse Proxy CDN CDN file2-3 Client Agent CDN CDN Agent Client CDN CDN file4-5 file3-4 Systems Lunch

Smart agent • Preserves HTTP semantics • Split large request into chunk requests • Merge chunk responses into one on the fly • In-order delivery • Parallel chunk requests • Keep a sliding window of chunk requests • Retry slow chunks Systems Lunch

Highest Random Weight(HRW) • Proxy runs HRW to pick a reverse proxy • Consistent hashing • Input: peer nodes + URL • Output: list of nodes in deterministic order • Action: pick the one with highest ranking Systems Lunch

Peering • Each node independently chooses peers • Before: • UDP ping, averaged for last four RTTs • Hysteresis • Problem: • Overlap of peer lists < 50% • Non-network delays introduced • After: • Use MinRTT, increase # of RTT samples • Overlap of peer lists > 90% Systems Lunch

Peer to both Peer to only Peer to only Reducing origin load Origin server • Load to the origin • Peer set difference • Solution • Allow more peers • Multi-hop routing Systems Lunch

Latency bottlenecks • Slow nodes are bad for synch’d workload • Agent’s window progress gets stuck • Temporary congestion • Original design • Retry timeout • Redesign • Having multiple connections compete • Avoid them entirely if nodes are too slow Systems Lunch

Fractional HRW? • Introducing weight [0..1] to HRW • Higher means slower • Choose a node only if Last_10_bits(HRW hash)/1024 < weight • Giving less chance for slower nodes • Experiment results • Overall, it works as we expected • Not great for synchronized workload Systems Lunch

potential bottlenecks Bandwidth Systems Lunch

Worst vs. best sites Worst five sites Best five sites Systems Lunch

Downloading experiment • Fetch a 50MB file from a Princeton server • Use 115 PlanetLab nodes at the same time • Uncached workload • Evaluate our redesign step-by-step • Original • NoSlow • MinRTT • 120Peers • RepFactor • MultiHop • NewAgent Systems Lunch

Step-by-step improvement 1 0.8 0.6 Fraction of Nodes <= x 0.4 0.2 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Bandwidth(Kbps) Original No Slow Min RTT 120 Peers Rep Factor MultiHop NewAgent BitTorrent Systems Lunch

11.5 3.8 Reduction of load at origin 19 20 18 16 14 12 Requests to origin (total=115) 10 8 3.8/19 = 1/5 6 4 2 0 Original 120Peers MultiHop Systems Lunch

Conclusion • Initial design may not reflect the realities • Redesign dramatically improves the system • MinRTT • MultiHop • Aggressive retries • Result • 300% faster for synch’d workload • 80% reduction to the origin load Systems Lunch

Who’s using CoBlitz • Citeseer(http://citeseer.ist.psu.edu/) • PS/PDF links to CoBlitz • PlanetLab projects • Arizona Stork • Harvard SBON • Fedora Core mirror • http://coblitz.planet-lab.org/pub/fedora/linux/core/ Systems Lunch

Thanks! • http://codeen.cs.princeton.edu/coblitz/ • Demo? Systems Lunch

Comparisons with other systems • Bittorrent, Shark, BulletPrime Systems Lunch

Measuring bandwidths • Measuring bandwidths • Have nearest 10 nodes issue TCP connections • Average aggregate bandwidth for 30 seconds Systems Lunch

Systems Lunch

( Re)Design Considerations for Scalable Large-File Content Distribution

( Re)Design Considerations for Scalable Large-File Content Distribution

Presentation Transcript

Distributed File System: Data Storage for Networks Large and Small

CONTENT ADDRESSABLE NETWORK

Topics

Models for adaptive -streaming-aware CDNI - File Management and Content Collections draft -brandenburg-cdni-has- 01

A Scalable Content-Addressable Network

The Google File System

The Google File System

Designing scalable applications for cloud

CoBlitz: A Scalable Large-file Transfer Service (COS 461)

Fully Collusion Resistant Traitor Tracing with Short Ciphertexts and Private Keys

A Scalable Content Distribution Service for Dynamic Web Content

A Fully Collusion Resistant Broadcast, Trace and Revoke System

FastReplica : Efficient Large File Distribution within Content Delivery Networks

Hash File

The BitTorrent Protocol

Introductions

Secure Broadcast Systems and Perspective on Pairings

Power System Analysis Services - Silicon Valley

FastReplica : Efficient Large File Distribution within Content Delivery Networks

Network Coding for Large Scale Content Distribution

Large Scale File Distribution Sequential Branching Distribution

Introductions