280 likes | 423 Views
EuroSpeech-2003: Where have we been and where are we going? Kenneth.Church@jhu.edu. Consistent progress over decades Moore’s Law, Speech Coding, Error Rate History repeats itself Empiricism: 1950s Rationalism: 1970s Empiricism: 1990s Rationalism: 2010s (?)
E N D
EuroSpeech-2003: Where have we been and where are we going?Kenneth.Church@jhu.edu • Consistent progress over decades • Moore’s Law, Speech Coding, Error Rate • History repeats itself • Empiricism: 1950s • Rationalism: 1970s • Empiricism: 1990s • Rationalism: 2010s (?) • Discontinuities: Fundamental changes that invalidate fundamental assumptions • Petabytes: $2,000,000 $2,000 (in a decade) • Can demand keep up with supply? • If not Tech meltdown • New priorities: • Supply-side economics Demand-side economics • Search >> Compression & Dictation
Old Priority: Speech Compression New Priority: Speech Consumption Excellent Good 2000 Speech Quality Fair 1990 ITU Recommendations Cellular Standards 1980 Secure Telephony Poor 1980 Profile 1990 Profile 2000 Profile Bad Bit Rate (kb/s) Borrowed Slide Rich Cox
How much is a Petabyte?(1015 bytes) • Question from execs: • How do I explain to a lay audience • How much is a petabyte • And why everyone will buy lots of them • Wrong answer: • 106 is a million (a floppy disk/email msg) • 109 is a billion (a billion here, a billion there…) • 1012 is a trillion (the US debt) • 1015 is a zillion (= , an unimaginably large #)
How much is a Petabyte?(1015 bytes) • Question from execs: • How do I explain to a lay audience • How much is a petabyte • And why everyone will buy lots of them • Wrong answer: • 106 is a million (a floppy disk/email msg) • 109 is a billion (a billion here, a billion there…) • 1012 is a trillion (the US debt) • 1015 is a zillion (=∞, an unimaginably large #)
Text won’t consume PB/person; Speech won’t either (but it’s closer) Digital Immortality:Gordon Bell & Jim Gray (2000)
Our Problem: Create Demand for DataScope (5PBs) • Text won’t do it • Internet Archive: 2PBs + 20TBs/month • www.archive.org/about/faqs.php • Speech won’t either (but it is closer) • Worldwide Telephone Traffic: 1PB/day • 1PB/day = 3B hours of speech * 450 KBs/hour 6B people use the phone for an hour/day (half of which is speech and half is silence)
Take-Aways • Economics of Computing: • New Bottlenecks: Supply Demand • Old Challenge: How do we get more computing? Disks? Cycles? • New Challenge: How do we consume more? • New Challenge is Challenging • Text won’t consume DataScope (5PBs) • Speech won’t either (but it’s closer) • Text & Speech Sizes • Internet Archive: 2PBs + 20TBs/month • Worldwide Telephone Traffic: 1PB/day • New Bottlenecks: Storage Transport • Transporting Internet Archive is more challenging than storing it • WANs Sneakernet (Shipping Swappable Disks)
DataScope • Exciting (Expensive) Performance Servers • With lots of oomph • Plus “Boring” (Cheap) Storage Servers • With lots of slots for swappable disks • (and not much else) • (as little as we can get away with)
The Mobility Gap SneakernetEverything is getting better with Moore’s Law, but some things are getting better faster
Sneakernet • Gray’s Question: What is the best way to move a terabyte from place to place? • Updated Version: What is the best way to move a petabyte from place to place?
DataScope Storage Servers • Moveable & non-moveable media • Wires: better latency & convenience (low vol) • Disks: better throughput & cost (high vol) • Hybrid Combination: • Incremental Backup Wires • Large Restores Sneakernet • Import / Export Sneakernet • Users show up with 50 lbs of disks (carryon luggage) • And use them productively within a day • Metrics: Space & Time Power & Pounds
Moore’s Law & Sneakernet • Moore’s Law: • Sneakernetmore attractive over time • Moving companies compete with WANs • Man with Shipping Container • like Man with Van • Commute between Work & Home • Car moves 10s of TBs per day • Flash on phone moves 10s of GBs per day • Comparable to bandwidth of broadband to home
Sneakernet: Better Throughput & Cost (at vol), but Worse Latency & Convenience
Conclusion • Challenge: • Increase Demand • To Keep up with Supply • If demand fails to keep up with supply • Tech Meltdown • Sneakernet: Best hope • Wires can’t keep up • Wires cannot fill up today’s DataScope, • Let alone tomorrow’s DataScope • Roach Motels considered Harmful! • Data can check in, but it can’t check out...
Affordable Board On Delivering Embarrassingly Distributed Cloud ServicesHotnets-2008 $1B $2M Ken Church Albert Greenberg James Hamilton {church, albert, jamesrh}@microsoft.com
Containers:Disruptive Technology • Implications for Shipping • New Ships, Ports, Unions • Implications for Hotnets • New Data Center Designs • Power/Networking Trade-offs • Cost Models: Expense vs. Capital • Apps: Embarrassingly Distributed • Restriction on Embarrassingly Parallel • Machine Models • Distributed Parallel Cluster Parallel Cluster
Data Center (DC) Cost Models: Power vs. Networking • Big ticket items: Contents, Power, Networking • Currently, cost of DC (excluding contents) ≈ cost of contents • Moore’s Law: more appropriate for contents than DC • Eventually, cost of DC >> cost of contents • Power: costs more than networking • Opportunity: Power • Self-Serving Recommendation: • Save big $$ on power • By investing in what we do • Networking • Redesigning Apps (for geo-diversity)
Cost Models: Expense vs. Capital • Wide Agreement: Cost dominated by power • Less Agreement: Expense or Capital? • Can we save $$ by putting clouds to sleep? • Yes, but only a little expense • Big Opportunity: Capital • Capital: Batteries, Generators, Power Distribution • Worst case forecast (capital) >> Actual usage (expense) • Bottleneck: Sunk Costs (Independent of Usage) • Rights to consume power: like the last seat on an airplane • Putting clouds to sleep: Like stripping weight on last seat • Saves a little expense (fuel), • But better to sell last seat (even at a deep discount) Use it or lose it (more applicable for sunk capital than expense)
Risk Management Micro • Risky and expensive • To put all our eggs in one basket (Mark Twain) • Mega DC Redundancy Mechanisms • Power: Batteries, Generators, Diesel Fuel • Over 20% of DC costs is in power redundancy • Networking: Protection (SONET) • Micro Alternative: Geo-Redundancy • N+1 Redundancy: More attractive for large N • Geo-Redundancy: Not appropriate for all apps • But many apps are Embarrassingly Distributed
Clouds Condos Containers Whenever we see a crazy idea Within 2x of current practice, Something is wrong Let’s go pick some low hanging fruit Though maybe not as crazy as we thought… Changes are coming
Container Abstraction • Modular Data Center (≈2k Servers/Container) • Cheap Units: millions, not billions • Affordable, even by universities • Just-in-Time: • Easy to build, provision, move, buy/sell, operate • Sealed Boxcar: No Serviceable Parts • No Humans Inside • No room for people, too hot, noisy, unsafe (no fire exits) • Not OSHA Compliant • Self-contained unit with everything but • Power: 480V • External Network: 1 Gbps • Cooling: Chilled Water
Containers are a Disruptive Technology:Implication for CS Theory/Algorithms • Abstract Machine Model • Distributed Parallel Cluster Parallel Cluster • Distributed Container Farm • Parallel Cluster (with Boundaries) • Better communication within containers • Than across containers (wide area network) • Challenge: • Find appropriate apps (not for everything) • Apps that fit within boundaries • Or distribute nicely across them (Embarrassingly Distributed) • Embarrassingly Distributed: • Stronger Condition than Embarrassingly Parallel • Map Reduce, Sort, Scatter Gather
Embarrassingly Distributed Apps • Currently distributed: • voice mail, telephony (Skype), P2P file sharing (Napster), multicast, eBay, online games (Xbox Live), grid computing • Obvious candidates: • spam filtering & email (Hotmail), backup, grep (simple but common forms of searching through a large corpus) • Less obvious candidates: • map reduce computations (in the most general case), sort (in the most general case), social networking (Facebook)
Board $1B ConclusionsCurrent Status: Long on Mega • Industry is investing billions in mega • Lots of new mega Data Centers • Long-Term Assets/Liabilities • Depreciating over 15 years • Bottom line: • The industry would do well to consider • The micro alternative • Geo-Diversity >> Batteries & Generators • Fragmentation Tax (sublinear) << Redundancy Tax (linear) • Just-in-Time Options: • Easy to build, provision, move, buy/sell, operate • Risk (hedge the long position on mega) $2M Affordable is aggressively adopting