200 likes | 396 Views
Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling. Karim Chine Cloud Era Ltd karim.chine@cloudera.co.uk BD 04 February 2011. Scientific Computing Environments. www.scipy.org. http://root.cern.ch.
E N D
Elastic-RA cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era Ltd karim.chine@cloudera.co.uk BD 04February2011
ScientificComputingEnvironments www.scipy.org http://root.cern.ch • Open-source (GPL) software environment for statistical computing and graphics • Lingua franca of data analysis. • Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate. www.sagemath.org www.scilab.org www.wolfram.com office.microsoft.com • R Website: http://www.r-project.org/ • CRAN Task View: http://cran.r-project.org/web/views/ • CRAN packages : http://cran.cnr.berkeley.edu/ • Bioconductor: http://www.bioconductor.org/ • R Metrics: https://www.rmetrics.org/ www.mathworks.com www.sas.com www.spss.com
The ‘s Success Story From: John Fox, Aspects of the Social Organization and Trajectory of the R Project, R Journal-Feb 2009
Scientific/StatisticalComputing Software, HPC and Usability "Give me a place to stand, and I shall move the earth with a lever"
Extract from the NetSolve/GridSolveDescription Document The emergence of Grid computing as the prototype of a next generation cyberinfrastructure for science has excited high expectations for its potential as an accelerator of discovery, but it has also raised questions about whether and how the broad population of research professionals, who must be the foundation of such productivity, can be motivated to adopt this new and more complex way of working. The rise of the new era of scientific modeling and simulation has, after all, been precipitous, and many science and engineering professionals have only recently become comfortable with the relatively simple world of the uniprocessor workstations and desktop scientific computing tools. In that world, software packages such as Matlab and Mathematica represent general-purpose scientific computing environments (SCEs) that enable users — totaling more than a million worldwide — to solve a wide variety of problems through flexible user interfaces that can model in a natural way the mathematical aspects of many different problem domains. Moreover, the ongoing, exponential increase in the computing resources supplied by the typical workstation makes these SCEs more and more powerful, and thereby tends to reduce the need for the kind of resource sharing that represents a major strength of Grid computing [1]. Certainly there are various forces now urging collaboration across disciplines and distances, and the burgeoning Grid community, which aims to facilitate such collaboration, has made significant progress in mitigating the well-known complexities of building, operating, and using distributed computing environments. Butit is unrealistic to expect the transition of research professionals to the Grid to be anything but halting and slow if it means abandoning the SCEs that they rightfully view as a major source of their productivity. We therefore believe that Grid computing’s prospects for success will tend to rise and fall according to its ability to interface smoothly with the general purpose SCEs that are likely to continue to dominate the toolbox of its targeted user base. Arnold, D. and Agrawal, S. and Blackford, S. and Dongarra, J. and Miller, M. and Seymour, K. and Sagi, K. and Shi, Z. and Vadhiyar, S.
Elastic-R is a ubiquitous plug-and-play platform for scientific and statistical computing Computational Components R packages : CRAN, Bioconductor, WrappedC,C++,Fortran code Scilab modules, MatlabToolkits, etc. Open source or commercial Computational User Interfaces Workbench within the browser Built-in views / Plugins / Spreadsheets Collaborative views Open source or commercial ComputationalResources Hardware & OS agnosticcomputingengine : R, Scilab,.. Clusters, grids, private or public clouds free: academicgridsor pay-per-use: EC2, Azure Computational Data Storage Local, NFS, FTP, Amazon S3, Amazon EBS free or commercial Computational Scripts R / Python / Groovy On client side: interactivity.. On server side: data transfer .. Computational Application Programming Interfaces Java / SOAP / REST, Stateless and stateful Generated Computational Web Services Stateful or stateless, automatic mapping of R data objects and functions
Elastic-R portal: single facade to public and private clouds Public Clouds Private Cloud
Elastic-R is a collaborative Virtual Research Environment. Users can share their machine instances, stateful remote engines, data,..
Reproducible research: A scientist can snapshot her computational environment and her data. She can archive the snapshot or share it with others. Elastic-R Amazon Machine Images Elastic-R AMI 1 R 2.10 + BioC 2.5 Elastic-R AMI 2 R 2.9 + BioC 2..3 Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R AMI 3 R 2.8+BioC 2.0 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Elastic-R.org Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R EBS 4 Data Set VVV Elastic-R EBS1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV
Anatomy of an Elastic-R machine instance on Amazon EC2 Restful WS over SSL Restful WS over SSL SOAP over SSL Heartbeat Restful WS over SSL SSH HTTPS
The scientist can control any number of stateful R engines from within an R session on the cloud or on his machine. He can use them for parallel computing
Software+Services=Applications convergence + ubiquitous collaboration. The server-side toolkit: R + spreadsheet models + virtual gui widgets.
The Elastic-R portal itself is an EC2 machine instance. Any number of portals can be run on EC2 for decentralized and private collaboration Amazon Virtual Private Cloud Subnet 2 Subnet 1 Subnet 3
Stateful generated Web Services : Elastic-R for workflow workbenches T1 getData T2 T3 Login SessionIDassociated with a reserved Elastic-R Engine LogOn Pwd Options ES f ( ES ) ESon2 ESon3 ESon1 Retrieve Data T1,T2,T3 : GeneratedStateful Web Services for R functions T1,T2 & T3 LogOn,getData : R-SOAP methods ES : ExpressionSet ESon1, ESon2, ESon3 : ExpressionSet Object Names f = T3 o T2 o T1 logOff • removeESonx • « Clean » Elastic-R Engine • Put Elastic-R Engine back in the Pool • killElastic-R Engine
One Amazon account and many users : Elastic-R signed tokens AWS Credentials + Private Key XXYYZZ XXYYZZ XXYYZZ XXYYZZ Generate token Deliver token Use token Activate token Launch machine instance Register machine instance Use R console Call R Engine
Links • Elastic-R Portal : • www.elastic-r.org • Articles about the project: • Chine K. (2010). Open Science in the Cloud: Towards a Universal Platform for Scientific and Statistical Computing. In Handbook of Cloud Computing. (Chapter 19). Springer US. • Karim Chine, "Learning Math and Statistics on the Cloud, Towards an EC2-Based Google Docs-like Portal for Teaching / Learning Collaboratively with R and Scilab," icalt, pp.752-753, 2010 10th IEEE International Conference on Advanced Learning Technologies, 2010 • Karim Chine, "Scientific Computing Environments in the age of virtualization, toward a universal platform for the Cloud" pp. 44-48, 2009 IEEE International Workshop on Open Source Software for Scientific Computation (OSSC), 2009 • Karim Chine, "Biocep, Towards a Federative, Collaborative, User-Centric, Grid-Enabled and Cloud-Ready Computational Open Platform" escience,pp.321-322, 2008 Fourth IEEE International Conference on eScience, 2008 • Linkedin Group: • http://www.linkedin.com/groups?home=&gid=2345405
Elastic-R SOA platform Pool A PoolB PoolC Node 1: Windows XP Node 2: Mac OS Front-end host Remote Objects Registry R-HTTP R-SOAP Node 3: 64 bits Server / Linux Parallel Computing Applications Borrow Rs Use Rs Release Rs Supervisor .NET Appli logOn Use R logOff Perl Scripts logOn Use R logOff Node 4 : EC2 virtual machine 1 Node 4 : EC2 virtual machine 1 Web Application Borrow R Generate Graphics/Data Release R Cloudbursting via Amazon Web Services Node 5 : EC2 virtual machine 2