500 likes | 680 Views
eScience Open Mic : Cloud Computing. Bill Howe, Phd eScience Institute, UW. http://escience.washington.edu. eScience is about data. Old model: “ Query the world ” (Data acquisition coupled to a specific hypothesis)
E N D
eScience Open Mic:Cloud Computing Bill Howe, Phd eScience Institute, UW
eScience is about data Old model: “Query the world” (Data acquisition coupled to a specific hypothesis) New model: “Download the world” (Data acquisition supports many hypotheses) • Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS) • Biology: lab automation, high-throughput sequencing, • Oceanography: high-resolution models, cheap sensors, satellites 40TB / 2 nights ~1TB / day 100s of devices Bill Howe, eScience Institute
eScience is married to the Cloud: Scalable computing and storage for everyone Bill Howe, eScience Institute
[Slide source: Werner Vogels] Generator Bill Howe, eScience Institute
"... computing may someday be organized as a public utility just as the telephone system is a public utility... The computer utility could become the basis of a new and important industry.” -- John McCarthy Emeritus at Stanford Inventor of LISP 1961 Bill Howe, eScience Institute
Economies of Scale src: Armbrust et al., Above the Clouds: A Berkeley View of CloudComputing, 2009 Bill Howe, eScience Institute
Economies of Scale src: James Hamilton, Amazon.com Bill Howe, eScience Institute
Elasticity Provisioning for peak load src: Armbrust et al., Above the Clouds: A Berkeley View of CloudComputing, 2009 Bill Howe, eScience Institute
Elasticity Underprovisioning src: Armbrust et al., Above the Clouds: A Berkeley View of CloudComputing, 2009 Bill Howe, eScience Institute
Elasticity Underprovisioning, more realistic src: Armbrust et al., Above the Clouds: A Berkeley View of CloudComputing, 2009 Bill Howe, eScience Institute
Animoto Bill Howe, eScience Institute [Werner Vogels, Amazon.com]
Periodic Bill Howe, eScience Institute [Deepak Singh, Amazon.com]
Growth Bill Howe, eScience Institute
Amazon Bill Howe, eScience Institute [Werner Vogels, Amazon.com]
Bill Howe, eScience Institute [Werner Vogels, Amazon.com]
History History
Timeline 2000 2001 2004 2008 2005+ 2006 2009 Application Service Providers Bill Howe, eScience Institute
Exemplars • Software as a Service • Platform as a Service • Infrastructure as a Service Bill Howe, eScience Institute
Grid Computing • Grid vs. Cloud • WAN vs. centralized • Heterogeneous vs. Data Center • Physical vs. Virtualized • Fewer, larger, dedicated allocations vs. more, smaller, shared allocations Foster 2002 Bill Howe, eScience Institute
Cloud Services Infrastructure-aaS Platform-aaS Software-aaS Constrained Windows Azure Google App Engine Google Docs EC2 Force.com SQL Azure SalesForce.com S3 Elastic MapReduce Automation Bill Howe, eScience Institute
Azure FC Owns this Hardware History Highly-available Fabric Controller (FC) Bill Howe, eScience Institute [Roger Barga, Microsoft]
Bill Howe, eScience Institute [Roger Barga, Microsoft]
Bill Howe, eScience Institute [Roger Barga, Microsoft]
Bill Howe, eScience Institute [Roger Barga, Microsoft]
Bill Howe, eScience Institute [Roger Barga, Microsoft]
At Minimum • CPU: 1.5-1.7 GHz x64 • Memory: 1.7GB • Network: 100+ Mbps • Local Storage: 500GB • Up to • CPU: 8 Cores • Memory: 14.2 GB • Local Storage: 2+ TB Bill Howe, eScience Institute [Roger Barga, Microsoft]
Web Role Worker Role main() { … } HTTP ASP.NET, WCF, etc. IIS Load Balancer Agent Agent Fabric VM Bill Howe, eScience Institute [Roger Barga, Microsoft]
HTTP Blobs Drives Tables Queues Application Storage Compute Fabric … Bill Howe, eScience Institute [Roger Barga, Microsoft]
AzureScope • http://azurescope.cloudapp.net/ • Performance measurements Bill Howe, eScience Institute [Roger Barga, Microsoft]
My 2 Favorite Use Cases Bill Howe, eScience Institute
Use Case 1: “Google Docs for developers” • The cloud is the ultimate collaborative development environment • A shared environment outside of the jurisdiction of over-protective (or otherwise non-responsive) sysadmins • No bugs closed as “can’t replicate” • Example: New software for serving oceanographic model results, requiring collaboration between UW, OPeNDAP.org, and OOI Bill Howe Bill Howe, eScience Institute
Waited two weeks for credentials to be established • Gave up, spun up an EC2 instance, rolling within an hour Similarly, Seattle’s Institute for Systems Biology uses EC2/S3 for sharing computational pipelines Bill Howe, eScience Institute
Use Case 2: Reproducible Research • Protocols, assays, experiments, workflows are increasingly computational • Paradoxically, these activities are often harder to reproduce than “manual” protocols • Why? Bill Howe, eScience Institute
Python2.5 MATLAB Proj4 PostGIS Java 1.5 EJB PostgreSQL SAX SOAP Libs config XML-RPC Libs TomCat S3/EC2 Apache SQL Server Data Services mod_python config VTK security Google App Engine OpenGL Mesa account management 3D Drivers Software dependencies
Division of Responsibility Q: Where should we place the division of responsibility between developers and users? Need to consider skillsets • Can they install packages? • Can they compile code? • Can they write DDL statements? • Can they configure a web server? • Can they troubleshoot network problems? • Can they troubleshoot permissions problems? Frequently the answer is“No” Plus: Tech support is hard. Usually easier to “fix it yourself.”
Division of Responsibility Is there anything busy users arewilling to do?
Example in the classroom • Dr. Randy Leveque, AMATH 574, Winter 2009 • Virtual machines with Clawpack software pre-installed, along with data, models, and analysis tools. • See a How To at http://escience.washington.edu/ search for “virtual machine” • (or go here: http://bit.ly/eMOcle ) Bill Howe, eScience Institute
Use Case 3: Data Sharing • The days of FTP are over • It takes days to transfer 1TB over the Internet, and it isn’t likely to succeed. • Need to push the computation to the data, rather than push the data to the computation • Cloud is perfect • Globally shared storage • Equipped with arbitrary, on-demand computation by anyone Bill Howe, eScience Institute
Case Studies Bill Howe, eScience Institute
FoldIt • Database, fileserver, multiple webservers • < $30k for a 3 year term • Database replicated in multiple zones • Web servers scale automatically with usage • includes 1TB of storage Bill Howe, eScience Institute
Many more • Computational Fluid Dynamics • Astronomy • GPGPUs • HIPAA-protected applications • National Security applications • It’s Mainstream! Bill Howe, eScience Institute