220 likes | 393 Views
( Re)Design Considerations for Scalable Large-File Content Distribution. Brian Biskeborn, Michael Golightly, KyoungSoo Park , and Vivek Pai. Design meets realities. Challenges in deploying distributed systems Real issues that are feedback for better design Not about a novel idea
E N D
(Re)Design Considerations forScalable Large-File Content Distribution Brian Biskeborn, Michael Golightly, KyoungSoo Park, and Vivek Pai Systems Lunch
Design meets realities • Challenges in deploying distributed systems • Real issues that are feedback for better design • Not about a novel idea • Performance debugging with CoBlitz • Peering strategy • Reducing load to the origin • Latency bottlenecks Systems Lunch
CoBlitz background • Scalable large file service • HTTP on top of conventional CDN • Cache by chunk rather than whole file • Transparent split/merge of chunks • http://coblitz.codeen.org:3125/your_url • Deployed on PlanetLab • 10 months of north American deployment • 10 months of world-wide deployment Systems Lunch
file1-2 file0-1 file0-1 file1-2 file 0-1 file 0-1 file2-3 file2-3 file 4-5 file 4-5 file3-4 file3-4 file4-5 file4-5 How it works Only reverse proxy(CDN) caches the chunks! file0-1 file1-2 CDN = Redirector + Reverse Proxy CDN CDN file2-3 Client Agent CDN CDN Agent Client CDN CDN file4-5 file3-4 Systems Lunch
Smart agent • Preserves HTTP semantics • Split large request into chunk requests • Merge chunk responses into one on the fly • In-order delivery • Parallel chunk requests • Keep a sliding window of chunk requests • Retry slow chunks Systems Lunch
Highest Random Weight(HRW) • Proxy runs HRW to pick a reverse proxy • Consistent hashing • Input: peer nodes + URL • Output: list of nodes in deterministic order • Action: pick the one with highest ranking Systems Lunch
Peering • Each node independently chooses peers • Before: • UDP ping, averaged for last four RTTs • Hysteresis • Problem: • Overlap of peer lists < 50% • Non-network delays introduced • After: • Use MinRTT, increase # of RTT samples • Overlap of peer lists > 90% Systems Lunch
Peer to both Peer to only Peer to only Reducing origin load Origin server • Load to the origin • Peer set difference • Solution • Allow more peers • Multi-hop routing Systems Lunch
Latency bottlenecks • Slow nodes are bad for synch’d workload • Agent’s window progress gets stuck • Temporary congestion • Original design • Retry timeout • Redesign • Having multiple connections compete • Avoid them entirely if nodes are too slow Systems Lunch
Fractional HRW? • Introducing weight [0..1] to HRW • Higher means slower • Choose a node only if Last_10_bits(HRW hash)/1024 < weight • Giving less chance for slower nodes • Experiment results • Overall, it works as we expected • Not great for synchronized workload Systems Lunch
potential bottlenecks Bandwidth Systems Lunch
Worst vs. best sites Worst five sites Best five sites Systems Lunch
Downloading experiment • Fetch a 50MB file from a Princeton server • Use 115 PlanetLab nodes at the same time • Uncached workload • Evaluate our redesign step-by-step • Original • NoSlow • MinRTT • 120Peers • RepFactor • MultiHop • NewAgent Systems Lunch
Step-by-step improvement 1 0.8 0.6 Fraction of Nodes <= x 0.4 0.2 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Bandwidth(Kbps) Original No Slow Min RTT 120 Peers Rep Factor MultiHop NewAgent BitTorrent Systems Lunch
11.5 3.8 Reduction of load at origin 19 20 18 16 14 12 Requests to origin (total=115) 10 8 3.8/19 = 1/5 6 4 2 0 Original 120Peers MultiHop Systems Lunch
Conclusion • Initial design may not reflect the realities • Redesign dramatically improves the system • MinRTT • MultiHop • Aggressive retries • Result • 300% faster for synch’d workload • 80% reduction to the origin load Systems Lunch
Who’s using CoBlitz • Citeseer(http://citeseer.ist.psu.edu/) • PS/PDF links to CoBlitz • PlanetLab projects • Arizona Stork • Harvard SBON • Fedora Core mirror • http://coblitz.planet-lab.org/pub/fedora/linux/core/ Systems Lunch
Thanks! • http://codeen.cs.princeton.edu/coblitz/ • Demo? Systems Lunch
Comparisons with other systems • Bittorrent, Shark, BulletPrime Systems Lunch
Measuring bandwidths • Measuring bandwidths • Have nearest 10 nodes issue TCP connections • Average aggregate bandwidth for 30 seconds Systems Lunch