170 likes | 274 Views
CoDeeN,Large Files, & CoDeploy. KyoungSoo Park, Vivek Pai, Larry Peterson Princeton University. What Is CoDeeN?. Content Distribution Networks Web pages load faster if You’re contacting a nearby server That server isn’t overloaded The page is already in memory
E N D
CoDeeN,Large Files, & CoDeploy KyoungSoo Park, Vivek Pai, Larry Peterson Princeton University
What Is CoDeeN? • Content Distribution Networks • Web pages load faster if • You’re contacting a nearby server • That server isn’t overloaded • The page is already in memory • You use long-lived TCP connections right
CoDeeN By The Numbers • In operation ~10 months • 150 nodes (~120 live) • 6.5 million reqs/day • 5 million “good” reqs/day • about 300GB/day (estimate) • 7K-20K unique IPs per 24 hours • Over 600,000 unique IPs served
Our “Strategy” • Stay operational • Build some credibility • Exploit that + activity to branch out • Involves doing sales pitches • Tap into new consumers • In particular, nonprofits, non-commercial
How Big? • 200 TeraBytes of data total • Interviews: about 3.5GB each • Files: average of 700MB each
Problem: “Nobody” Handles 700MB • CDNs designed for avg size 10KB • 1MB = 100 files • 700MB = 70,000 files • Commercial disks ~ 100GB • Our storage ~ 3GB
slow client New Problems • Why not replicate less? • You’re farther away • Why not merge requests? client readahead
file0-1 file1-2 file0-1 file file2-3 file4-5 file3-4 file4-5 Our Approach CDN CDN Client Agent CDN CDN Server CDN CDN
GET name/ranges Header: blah Header: blah HTTP/1.0 206 Partial Range: start-end/length Header: blah GET name Range: bytes ranges Header: blah HTTP/1.0 200 OK Content-length: piece length New-header: obj length Low-Level HTTP Stuff egress ingress
Benefits • Transparent to client (no software) • Server only needs byte-range support • Every real server has it • Will generate more log entries • Can use/augment HTTP infrastructure • Caching, redirection, etc • Adding security controls • Low incremental overhead • Agent is about 300 semicolons • CDN mods about 20 semicolons
Dual-Use Technology • Other one-to-many problems • Node/experiment installs • Software updates • Push model instead of pull • Solution? • Build “master” script • Push to nodes • Nodes pull as needed
CoDeploy • Now in beta • Small set of tools at source • No (new) installation at target • Needed tools at CoDeeN-hosting nodes • Fun components • Peer-review system of CoDeeN nodes • Nearest CoDeeN finder • Parallel ssh, scp
What To Expect Next • Will redeploy auto-rewriting service • Akamai-like URL mangling • Was in testing before December upgrade • Tie rewriter into “hosting” service • Make it simpler for provider to use CoDeeN
More Info http://codeen.cs.princeton.edu/codeploy KyoungSoo Park kyoungso@cs.princeton.edu Vivek Pai vivek@cs.princeton.edu