310 likes | 480 Views
Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31 May 2010. http://abovetheclouds.cs.berkeley.edu/. *Director, Intel Labs Berkeley. Image: John Curley http://www.flickr.com/photos/jay_que/1834540/.
E N D
Beyond the Hype:A Berkeley View of Cloud Computing Anthony D. Joseph*, UC BerkeleyReliable Adaptive Distributed Systems Lab TNC 2010 31 May 2010 http://abovetheclouds.cs.berkeley.edu/ *Director, Intel Labs Berkeley Image: John Curley http://www.flickr.com/photos/jay_que/1834540/
Outline • What is it? • Why now? • Cloud economics and opportunities • Challenges
Datacenter is new “server” “Program” == Web search, email, map/GIS, … “Computer” == 1000’s computers, storage, network Warehouse-sized facilities and workloads New datacenter ideas (2007-2008): truck container (Sun), floating (Google), datacenter-in-a-tent (Microsoft) How to enable innovation in new services without first building & capitalizing a large company? 3 photos: Sun Microsystems & datacenterknowledge.com
RAD Lab 5-year Mission Enable 1 person to develop, deploy, operate next -generation Internet application • Key enabling technology: Statistical machine learning • debugging, power management, performance prediction, ... • Highly interdisciplinary faculty & students • PI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine learning), Joseph (systems/networks/security), Stoica (networks/P2P), Franklin (databases) • 2 postdocs, ~30 PhD students, ~5 undergrads
Examples • Predict performance of complex software system when demand is scaled up • Automatically add/drop servers to fit demand, without violating SLO • Distill millions of lines of log messages into an operator-friendly “decision tree” that pinpoints “unusual” incidents/conditions • Recurring themes: • Cutting-edge SML methods work where simpler methods have failed • Demonstrate applicability on at least 1000’s of machines!
Utility Computing Arrives Northern VA cluster • Amazon Elastic Compute Cloud (EC2) • “Compute unit” rental: $0.10-0.80 0.085-0.68/hour • 1 CU ≈ 1.0-1.2 GHz 2007 AMD Opteron/Intel Xeon core • No up-front cost, no contract, no minimum • Billing rounded to nearest hour (also regional, spot pricing) • New paradigm(!) for deploying services?, HPC?
Cloud as Major Enabler • Major enabler for SW as a Service (SaaS) startups • Animoto traffic doubled every 12 hours for 3 days when released as Facebook plug-in in April 2008 • Peak EC2 instances: • Mon 50, Tues 400, Wed 900, Friday 3400
But... What is cloud computing, exactly?
“It’s nothing new” “...we’ve redefined Cloud Computing to include everything that we already do... I don’t understand what we would do differently ... other than change the wording of some of our ads.” Larry Ellison, CEO, Oracle (Wall Street Journal, Sept. 26, 2008)
“It’s a trap” “It’s worse than stupidity: it’s marketing hype. Somebody is saying this is inevitable—and whenever you hear that, it’s very likely to be a set of businesses campaigning to make it true.” Richard Stallman, Founder, Free Software Foundation (The Guardian, Sept. 29, 2008)
Above the Clouds:A Berkeley View of Cloud Computing abovetheclouds.cs.berkeley.edu • White paper by RAD Lab PI’s and students • Clarify terminology around Cloud Computing • Quantify comparison with conventional computing • Identify Cloud Computing challenges & opportunities • Why can we offer new perspective? • Strong engagement with industry • Users of cloud computing in our own research and teaching in last 3 years • Goal: stimulate discussion on what’s really new • without resorting to weather analogies ad nauseam
Cloud Computing: True Utility Cloud Computing: App and Infrastructure over Internet Software as a Service: Applications over the Internet Utility Computing:“Pay-as-You-Go” Datacenter Hardware and Software Three New Aspects to Cloud Computing The Illusion of Infinite Computing Resources Available on Demand The Elimination of an Upfront Commitment by Cloud Users The Ability to Pay for Use of Computing Resources on a Short-Term Basis as Needed
Classifying Clouds App Model for Utility Computing Amazon EC2 Windows Azure Google AppEngine SomethingNew Close to Physical Hardware .NET and CLR… ASP.NET Support App Specific Traditional Web App Model Constraints on App Model Offer Tradeoffs… Lots of Ongoing Innovation… ??? ??? User Controls Most of Stack More Constraints on User Stack Constrained Stateless/Stateful Tiers • Instruction Set VM (Amazon EC2, 3Tera) • Managed runtime VM (Microsoft Azure) • Framework VM (Google AppEngine, Force.com) Lower-level, Less managed “flexibility/portability” Higher-level, More managed “more built-in functionality” ??? Hard to Auto Scale and Failover Auto Provisioning of Stateless App Auto Scaling and Auto High-Availability
Outline • What is it? • Why now? • Cloud economics and opportunities • Challenges
Why Now (not then)? Economies of Scale for Humongous Datacenters (1,000’s to 10,000’s of commodity computers) Electricity Network Operations Hardware Put Datacenters at Cheap Power Put Datacenters on Main Trunks Standardize and Automate Ops Containerized Low-Cost Servers James Hamilton, Internet Scale Service Efficiency, Large-Scale Distributed Systems and Middleware (LADIS) Workshop Sept‘08 Huge DCs 5-7X as Cost Effective as Medium-Scale DCs
Why Now (not then)? • Common HW & SW platform • x86 as universal ISA, plus fast virtualization • Standard software stack, largely open source (LAMP) • Bet: Can statistically multiplex multiple instances onto a single box without interference between instances • Novel economic model: fine grain billing • Earlier examples: Sun, Intel Computing Services—longer commitment, more $$$/hour • Infrastructure software: eg Google FileSystem • Operational expertise: failover, DDoS, firewalls... • More pervasive broadband Internet
Cloud Economics 101 • Static provisioning for peak: wasteful, but necessary for SLO Risk of underutilization if peak predictions are too optimistic – Wasted CapEx Capacity Resources Resources Capacity Demand Demand “Statically provisioned” data center “Virtual” data center in the cloud Time Time Unused resources
Risks of underprovisioning Resources Resources Resources Capacity Capacity Capacity Lost revenue Demand Demand Demand 2 2 2 3 3 3 1 1 1 Time (days) Time (days) Time (days) Lost users
New Scenarios Enabled by “Risk Transfer” • Not (just) CapEx vs. OpEx! • “Cost associativity”: 1,000 computers for 1 hour same price as 1 computer for 1,000 hours • Washington Post converted Hillary Clinton’s travel documents to post on WWW <1 day after released • RAD Lab graduate students demonstrate improved Hadoop (batch job) scheduler—on 1,000 servers
Outline • What is it? • Why now? • Cloud economics and opportunities • Challenges
Obstacles and Opportunities Open source reimplementations of Google AppEngine (AppScale), EC2 API (Eucalyptus), BigTable (HyperTable) 2/11/09: IBM WebSphere™ and other service-delivery software available on AWS with pay-as-you-go pricing Freedom OSS partnership with Amazon to allow FedEx-ing disks into their datacenters, Amazon hosting free public datasets to “attract” cycles
Cloud or Earthbound: “Should I Move to the Cloud?” • Some app types made more compelling • Surge computing: overflow into the cloud • Extend desktop apps into cloud: Matlab, Mathematica; soon productivity apps? • Batch processing to exploit cost associativity, e.g. for business analytics • Other apps more challenged • Bulk data movement expensive, slow • Jitter-sensitive apps (long-haul latency & transient performance distortion due to virtualization) • What about HPC/Grid apps?
2008 Observed EC2 Topology Caveat: only shows routers, not switches
EC2 Large Instance Network Performance • Measurement: • Amount of data transferred between 2 instances in 10 seconds • Network quirks • 96% RTT < 1ms • Occasional route changes lead to weird paths and increased latency
Effects on MPI Performance • Many research opportunities to address issues • Collocated rack-level allocation schemes • Gang scheduling for clouds/virtual clouds • New network topologies/implementations [Walker, :login 2008-10]
Summary • Many Cloud Computing Benefits: • Shift CapEx to OpEx , Scale OpEx to demand • Startups and prototyping, One-off tasks (Wash. Post) • Cost associativity • Research at scale • Many Cloud Computing Challenges: • Availability • Data in cloud may be a “gravity well” ($$$ to move) • Not ready for HPC applications yet • Opportunities to address performance issues • More: http://abovetheclouds.cs.berkeley.edu/
Thank you! adj@eecs.berkeley.edu http://abovetheclouds.cs.berkeley.edu/
Performance for Money Spent on EC2 • LINPACK cost effectiveness on EC2 [Napper and Bientinesi]
Heterogeneity in Virtualized Environments • VM technology isolates CPU and memory, but disk and network are shared • Full bandwidth when no contention • Equal shares when there is contention • 2.5x performance difference EC2 small instances
Power and Cooling Is Expensive! The Infrastructure for Power and Cooling Costs a LOT Infrastructure PLUS Energy > Server Cost Since 2001 Infrastructure Alone> Server Cost Since 2004 Energy Alone> Server Cost Since 2008 Cost Effective to Discard Inefficient Servers Willing to pay more $/server for more power efficient servers Belady, C., “In the Data Center, Power and Cooling Costs More than IT Equipment it Supports”, Electronics Cooling Magazine (Feb 2007) Power Savings Infrastructure Savings! Like Airlines Retiring Fuel-Guzzling Airplanes