1 / 28

EuroSpeech-2003: Where have we been and where are we going? Kenneth.Church@jhu

EuroSpeech-2003: Where have we been and where are we going? Kenneth.Church@jhu.edu. Consistent progress over decades Moore’s Law, Speech Coding, Error Rate History repeats itself Empiricism: 1950s Rationalism: 1970s Empiricism: 1990s Rationalism: 2010s (?)

amena
Download Presentation

EuroSpeech-2003: Where have we been and where are we going? Kenneth.Church@jhu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EuroSpeech-2003: Where have we been and where are we going?Kenneth.Church@jhu.edu • Consistent progress over decades • Moore’s Law, Speech Coding, Error Rate • History repeats itself • Empiricism: 1950s • Rationalism: 1970s • Empiricism: 1990s • Rationalism: 2010s (?) • Discontinuities: Fundamental changes that invalidate fundamental assumptions • Petabytes: $2,000,000  $2,000 (in a decade) • Can demand keep up with supply? • If not  Tech meltdown • New priorities: • Supply-side economics  Demand-side economics • Search >> Compression & Dictation

  2. Old Priority: Speech Compression New Priority: Speech Consumption Excellent Good 2000 Speech Quality Fair 1990 ITU Recommendations Cellular Standards 1980 Secure Telephony Poor 1980 Profile 1990 Profile 2000 Profile Bad Bit Rate (kb/s) Borrowed Slide Rich Cox

  3. How much is a Petabyte?(1015 bytes) • Question from execs: • How do I explain to a lay audience • How much is a petabyte • And why everyone will buy lots of them • Wrong answer: • 106 is a million (a floppy disk/email msg) • 109 is a billion (a billion here, a billion there…) • 1012 is a trillion (the US debt) • 1015 is a zillion (= , an unimaginably large #)

  4. How much is a Petabyte?(1015 bytes) • Question from execs: • How do I explain to a lay audience • How much is a petabyte • And why everyone will buy lots of them • Wrong answer: • 106 is a million (a floppy disk/email msg) • 109 is a billion (a billion here, a billion there…) • 1012 is a trillion (the US debt) • 1015 is a zillion (=∞, an unimaginably large #)

  5. Text won’t consume PB/person; Speech won’t either (but it’s closer) Digital Immortality:Gordon Bell & Jim Gray (2000)

  6. Our Problem: Create Demand for DataScope (5PBs) • Text won’t do it • Internet Archive: 2PBs + 20TBs/month • www.archive.org/about/faqs.php • Speech won’t either (but it is closer) • Worldwide Telephone Traffic: 1PB/day • 1PB/day = 3B hours of speech * 450 KBs/hour 6B people use the phone for an hour/day (half of which is speech and half is silence)

  7. Take-Aways • Economics of Computing: • New Bottlenecks: Supply  Demand • Old Challenge: How do we get more computing? Disks? Cycles? • New Challenge: How do we consume more? • New Challenge is Challenging • Text won’t consume DataScope (5PBs) • Speech won’t either (but it’s closer) • Text & Speech Sizes • Internet Archive: 2PBs + 20TBs/month • Worldwide Telephone Traffic: 1PB/day • New Bottlenecks: Storage  Transport • Transporting Internet Archive is more challenging than storing it • WANs  Sneakernet (Shipping Swappable Disks)

  8. DataScope • Exciting (Expensive) Performance Servers • With lots of oomph • Plus “Boring” (Cheap) Storage Servers • With lots of slots for swappable disks • (and not much else) • (as little as we can get away with)

  9. The Mobility Gap SneakernetEverything is getting better with Moore’s Law, but some things are getting better faster

  10. Sneakernet • Gray’s Question: What is the best way to move a terabyte from place to place? • Updated Version: What is the best way to move a petabyte from place to place?

  11. DataScope Storage Servers • Moveable & non-moveable media • Wires: better latency & convenience (low vol) • Disks: better throughput & cost (high vol) • Hybrid Combination: • Incremental Backup  Wires • Large Restores Sneakernet • Import / Export Sneakernet • Users show up with 50 lbs of disks (carryon luggage) • And use them productively within a day • Metrics: Space & Time  Power & Pounds

  12. Amazon’s Import / Export(Scaled Up…)

  13. Moore’s Law & Sneakernet • Moore’s Law: • Sneakernetmore attractive over time • Moving companies compete with WANs • Man with Shipping Container • like Man with Van • Commute between Work & Home • Car moves 10s of TBs per day • Flash on phone moves 10s of GBs per day • Comparable to bandwidth of broadband to home

  14. Sneakernet: Better Throughput & Cost (at vol), but Worse Latency & Convenience

  15. Conclusion • Challenge: • Increase Demand • To Keep up with Supply • If demand fails to keep up with supply  • Tech Meltdown • Sneakernet: Best hope • Wires can’t keep up • Wires cannot fill up today’s DataScope, • Let alone tomorrow’s DataScope • Roach Motels considered Harmful! • Data can check in, but it can’t check out...

  16. BACKUP

  17. Affordable Board On Delivering Embarrassingly Distributed Cloud ServicesHotnets-2008 $1B $2M Ken Church Albert Greenberg James Hamilton {church, albert, jamesrh}@microsoft.com

  18. Containers:Disruptive Technology • Implications for Shipping • New Ships, Ports, Unions • Implications for Hotnets • New Data Center Designs • Power/Networking Trade-offs • Cost Models: Expense vs. Capital • Apps: Embarrassingly Distributed • Restriction on Embarrassingly Parallel • Machine Models • Distributed Parallel Cluster  Parallel Cluster

  19. Mega vs. Micro Data Centers (DCs)

  20. Data Center (DC) Cost Models: Power vs. Networking • Big ticket items: Contents, Power, Networking • Currently, cost of DC (excluding contents) ≈ cost of contents • Moore’s Law: more appropriate for contents than DC • Eventually, cost of DC >> cost of contents • Power: costs more than networking • Opportunity: Power • Self-Serving Recommendation: • Save big $$ on power • By investing in what we do • Networking • Redesigning Apps (for geo-diversity)

  21. Cost Models: Expense vs. Capital • Wide Agreement: Cost dominated by power • Less Agreement: Expense or Capital? • Can we save $$ by putting clouds to sleep? • Yes, but only a little expense • Big Opportunity: Capital • Capital: Batteries, Generators, Power Distribution • Worst case forecast (capital) >> Actual usage (expense) • Bottleneck: Sunk Costs (Independent of Usage) • Rights to consume power: like the last seat on an airplane • Putting clouds to sleep: Like stripping weight on last seat • Saves a little expense (fuel), • But better to sell last seat (even at a deep discount) Use it or lose it (more applicable for sunk capital than expense)

  22. Risk Management  Micro • Risky and expensive • To put all our eggs in one basket (Mark Twain) • Mega DC  Redundancy Mechanisms • Power: Batteries, Generators, Diesel Fuel • Over 20% of DC costs is in power redundancy • Networking: Protection (SONET) • Micro Alternative: Geo-Redundancy • N+1 Redundancy: More attractive for large N • Geo-Redundancy: Not appropriate for all apps • But many apps are Embarrassingly Distributed

  23. Micro is Better: Both Capital & Expense

  24. Clouds  Condos  Containers Whenever we see a crazy idea Within 2x of current practice, Something is wrong Let’s go pick some low hanging fruit Though maybe not as crazy as we thought… Changes are coming

  25. Container Abstraction • Modular Data Center (≈2k Servers/Container) • Cheap Units: millions, not billions • Affordable, even by universities • Just-in-Time: • Easy to build, provision, move, buy/sell, operate • Sealed Boxcar: No Serviceable Parts • No Humans Inside • No room for people, too hot, noisy, unsafe (no fire exits) • Not OSHA Compliant • Self-contained unit with everything but • Power: 480V • External Network: 1 Gbps • Cooling: Chilled Water

  26. Containers are a Disruptive Technology:Implication for CS Theory/Algorithms • Abstract Machine Model • Distributed Parallel Cluster  Parallel Cluster • Distributed Container Farm • Parallel Cluster (with Boundaries) • Better communication within containers • Than across containers (wide area network) • Challenge: • Find appropriate apps (not for everything) • Apps that fit within boundaries • Or distribute nicely across them (Embarrassingly Distributed) • Embarrassingly Distributed: • Stronger Condition than Embarrassingly Parallel • Map Reduce, Sort, Scatter Gather

  27. Embarrassingly Distributed Apps • Currently distributed: • voice mail, telephony (Skype), P2P file sharing (Napster), multicast, eBay, online games (Xbox Live), grid computing • Obvious candidates: • spam filtering & email (Hotmail), backup, grep (simple but common forms of searching through a large corpus) • Less obvious candidates: • map reduce computations (in the most general case), sort (in the most general case), social networking (Facebook)

  28. Board $1B ConclusionsCurrent Status: Long on Mega • Industry is investing billions in mega • Lots of new mega Data Centers • Long-Term Assets/Liabilities • Depreciating over 15 years • Bottom line: • The industry would do well to consider • The micro alternative • Geo-Diversity >> Batteries & Generators • Fragmentation Tax (sublinear) << Redundancy Tax (linear) • Just-in-Time Options: • Easy to build, provision, move, buy/sell, operate • Risk (hedge the long position on mega) $2M Affordable is aggressively adopting

More Related