90 likes | 105 Views
Explore the successful implementations of basic supercomputer architecture, OGF standards, services, workflow, and platforms in eScience. Discover the future trends in cloud and grid computing infrastructure.
E N D
Directions in eScience Interoperability and Science Clouds Geoffrey Fox gcf@indiana.edu Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington June 19 2012 Interoperability in Action – Standards Implementation in VENUS-C & the context of the SIENA RoadmapOGF35 at HPDC 2012 Delft
Successes in eScience I • Basic Supercomputer Architecture now being extended to Exascale • Grand Challenge activity 1990-2000 produced consensus • Basic OGF standards such as JSDL, BES, SAGA, GridFTP • Software as a Service • Use of Services • Use of Workflow • Use of Portals • Say “use of” as details not agreed
More on Successes • Appliances/Roles in Clouds (see Venus-C later) • Images defined explicitly (by construction) or implicitly by content • Value added Platforms such as MPI, parallel domain specific Libraries, (Iterative) MapReduce, Queues, Tables and other NOSQL data models, Object Stores, HDFS/GFS style file systems • PaaS delivered by tools/libraries/roles? • Other good important general standards in security, OVF, accounting, networking
What Platforms to use in Clouds • HDFS style file system to collocate data and computing • Or Object Stores as basic scalable storage • Queuesto manage multiple tasks • Tables to track job information • MapReduce and Iterative MapReduce for parallelism • Services for everything • Portals as User Interface • Appliances and Roles as customized images • Software environments/tools like Google App Engine, memcached, • Workflow to link multiple services (functions)
What to use in Grids and Supercomputers? • Portals,Services and Workflow as in clouds • MPI and GPU/multicore threaded parallelism • Wonderful libraries supporting parallel linear algebra, particle evolution, partial differential equation solution • Parallel I/O for high performance in an application • Wide area File System (e.g. Lustre) supporting file sharing • This is a rather different style of PaaS from clouds – we should unify?
Comments • No agreement on problem to solve e.g. what is architecture for data intensive problems, role of clouds(!) • Certainly no agreement on even style of workflow • Services can be WSDL or REST • Confusion as to architecture level being standardized • User or developer? • e.g. clouds may be built on federated infrastructure; that must be hidden from user
Some Standards Futures • In general look for a few key SIMPLE concepts • From past, SQL and MPI standardization very successful – suggesting that Cloud PaaS standards should be looked at • MapReduce • NOSQL data models • Needs to be done at right time • De facto standard “Hadoop” versus “real” standard • What “roles” are important: Worker, Web, Grid, Worker + I/O, MPI, MapReduce, GPU – need a study? • Roles v. Libraries v. Standard Interfaces • GPU related standards: OpenACC extends OpenMP
Using Science Clouds in a Nutshell • High Throughput Computing; pleasingly parallel; grid applications • Multiple users (long tail of science) and usages (parameter searches) • Internet of Things (Sensor nets) as in cloud support of smart phones • (Iterative) MapReduce including “most” data analysis • Exploiting elasticity and platforms (HDFS, Object Stores, Queues ..) • Use worker roles, services, portals (gateways) and workflow • Good Strategies: • Build the application as a service; • Build on existing cloud deployments/roles such as Hadoop; • Use PaaSif possible; (This is not clearly eScience strategy – uses IaaS?) • Design for failure; (Not much work on what this means. Are there tools?) • Use as a Service (e.g. SQLaaS) where possible; (WHAT should be Provided) • Address Challenge of Moving Data (Need Production large scale Science Cloud)
Cosmic Comments • Does Cloud + MPI Engine cover the future? • Will current High throughput computing and cloud concepts merge? • Need Data analytics libraries for HPC and Clouds • Does a “modest-size private science cloud” make sense • Too small to be elastic • Should governments fund use of commercial clouds (or build their own) • Most science doesn’t have privacy issues motivating some private clouds • Most interest in clouds from “new” applications such as life sciences • Recent cloud infrastructure (Eucalyptus 3, OpenStack Essex) much improved • More employment opportunities in clouds than HPC and Grids; so cloud related activities popular with students