110 likes | 312 Views
Cloud Computing BOF. OGF22 Birds of a Feather Session Hyatt Regency Cambridge February 27 2008 Geoffrey Fox Indiana University gcf@indiana.edu. Cloud Agenda. Geoffrey Fox (Indiana U.) Remarks on Cloud Computing Martin Swany (Internet2) Clouds and Dynamic Networking
E N D
Cloud Computing BOF OGF22 Birds of a Feather Session Hyatt Regency Cambridge February 27 2008 Geoffrey Fox Indiana University gcf@indiana.edu
Cloud Agenda • Geoffrey Fox (Indiana U.) Remarks on Cloud Computing • Martin Swany (Internet2) Clouds and Dynamic Networking • Steven Newhouse (Microsoft) Personal View on Clouds • Kate Keahey (Argonne, Chicago) First Steps in the Clouds • Next Steps 2
What are Clouds? • Clouds are “Virtual Clusters” (“Virtual Grids”) of possibly “Virtual Machines” • They may cross administrative domains or may “just be a single cluster”; the user cannot and does not want to know • Clouds support access (lease of) computer instances • Instances accept data and job descriptions (code) and return results that are data and status flags • Each Cloud is a “Narrow” (perhaps internally proprietary) Grid • When does Cloud concept work • Parameter searches, LHC style data analysis .. • Common case (most likely success case for clouds) versus corner case? • Clouds can be built from Grids • Grids can be built from Clouds 3
Cloud References • http://en.wikipedia.org/wiki/Cloud_computing • Includes references to Amazon, Apple, Dell, Enomalism, Globus, Google, IBM, KnowledgeTreeLive, Nature, New York Times, Zimdesk • Others like Microsoft Windows Live Skydrive important • http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud • http://uc.princeton.edu/main/index.php?option=com_content&task=view&id=2589&Itemid=1 Policy Issues • http://www.cra.org/ccc/home.article.bigdata.html • Hadoop (MapReduce) and “Data Intensive Computing” • See Data intensive computing minitrack at HICSS-42 January 2009 • http://ianfoster.typepad.com/blog/2008/01/theres-grid-in.html • OGF Thought Leadership blog • OGF22 talks by Charlie Catlett and Irving Wladawsky-Berger 4
Big-Data Computing Study Group CCC Role Versus OGF? Hadoop and MapReduce are “just” workflow? 5
Google MapReduceSimplified Data Processing on Clusters/Clouds • http://labs.google.com/papers/mapreduce.html • This is a dataflow model between services where servicescan do useful document oriented data parallel applications including reductions • The decomposition of services onto cluster engines (clouds) is automated • The large I/O requirements of datasets changes efficiency analysis in favor of dataflow • Services (count words in example) can obviously be extended to general parallel applications • There are many alternatives to language expressing either dataflow and/or parallel operations and/or workflow 6
Technical Questions about Clouds I • What is performance overhead? • On individual CPU • On system including data and program transfer • What is cost gain • From size efficiency; “green” location (rumor that Google has purchased the Niagara Falls including Canada!) • Is Cloud Security adequate: can clouds be trusted? • Can one can do parallel computing on clouds? • Looking at “capacity” not “capability” i.e. lots of modest sized jobs • Marine corps will use Petaflop machines – they just need ssh and a.out 7
Technical Questions about Clouds II • How is data compute affinity tackled in clouds? • Co-locate data and compute clouds? • Lots of optical fiber i.e. “just” move the data? • What happens in clouds when demand for resources exceeds capacity – is there a multi-day job input queue? • Are there novel cloud scheduling issues? • Do we want to link clouds (or ensembles as atomic clouds); if so how and with what protocols • Is there an intranet cloud e.g. “cloud in a box” software to manage personal (cores on my future 128 core laptop) department or enterprise cloud? 8
Standards for Compute and Storage Clouds • We no longer need interoperability of services and messages (SOAP) but rather interoperability of clouds • Maybe each cloud so big that interoperability between clouds not so critical • Interoperability certainly for application specific data and perhaps also for job specifications • WFS, GML for Geo-data; IVOA standards; DST LHC experiment formats • JSDL, BES etc. • Each Cloud will be proprietary but they might want raw infrastructure standards so they can easily swap in and out different vendor’s disk drives • Clouds very very loosely coupled; services loosely coupled 9
MSI Challenge Problem • There are > 330 MSI’s – Minority Serving Institutions • 2 examples • ECSU is a small state university in North Carolina • HBCU with 4000 students • Working on PolarGrid (Sensors in Arctic/Antarctic linked to “TeraGrid”) • Navajo Tech in Crown Point NM is community college with technology leadership for Navajo Nation • “Internet to the Hogan and Dine Grid” links Navajo communities by wireless • Wish to integrate TeraGrid science into Navajo Nation education curriculum • Current Grid technology too complicated if you are not an R1 institution • Hard to deploy campus grids broadly into MSI’s • Clouds provide virtual campus resources? 10
Next Steps at OGF • Clouds are just starting and build on/are related to Grids • Clear need for best practice in use and technology • Likely to be need for new standards and novel use of existing/projected standards • New Cloud Community Group? • Chairs, participants? • Workshop? • OGF23 activity? • Identify key players not currently involved with OGF? 11