320 likes | 448 Views
Is the Cloud the Panacea for Process Efficiency? The Elastic-R Case Study . Karim Chine karim.chine@cloudera.co.uk. Efficiency Killers, a Selective Catalog (Scientific Computing Perspective). Problem I : Scientific Computing Environments Fragmentation. www.sagemath.org.
E N D
Is the Cloud the Panacea for Process Efficiency? The Elastic-R Case Study Karim Chine karim.chine@cloudera.co.uk
Efficiency Killers, a Selective Catalog (Scientific Computing Perspective)
Problem I : ScientificComputingEnvironments Fragmentation www.sagemath.org www.wolfram.com http://www.r-project.org http://root.cern.ch www.bioconductor.org office.microsoft.com www.mathworks.com www.scilab.org www.minitab.com www.jmp.com www.scipy.org www.sas.com accelrys.com www.taverna.org.uk www.spss.com www.perl.org
Problem II : Hardware, OS and Applications Fragmentation Version 2.5.0 Version 2.9.1 Version 2.6 Version 2.11.0 Version 2.10.0 Version 2.4.0 Version 2.1 Version 2.6.0
Problem III : Data Fragmentation / Inconsistency / Lack of Traceability
Problem VI : Poor IT / Software Usability "Give me a place to stand, and I shall move the earth with a lever"
Cloud Computing and the Building Blocks of Convergence
Technological Convergence Virtualization Technologies Java Web Services Rest/SOAP Infrastructure-as-a-Service WS APIs Html 5
, lingua franca of data analysis From: John Fox, Aspects of the Social Organization and Trajectory of the R Project, R Journal-Feb 2009
Elastic-R is a ubiquitous plug-and-play platform for scientific and statistical computing Computational Components R packages : CRAN, Bioconductor, WrappedC,C++,Fortran code Scilab modules, MatlabToolkits, etc. Open source or commercial Computational User Interfaces Workbench within the browser Built-in views / Plugins / Spreadsheets Collaborative views Open source or commercial ComputationalResources Hardware & OS agnosticcomputingengine : R, Scilab,.. Clusters, grids, private or public clouds free: academicgridsor pay-per-use: EC2, Azure Computational Data Storage Local, NFS, FTP, Amazon S3, Amazon EBS free or commercial Computational Scripts R / Python / Groovy On client side: interactivity.. On server side: data transfer .. Computational Application Programming Interfaces Java / SOAP / REST, Stateless and stateful Generated Computational Web Services Stateful or stateless, automatic mapping of R data objects and functions
Elastic-R portal: Access as-a-Service to Scientific Computing Environments running on centralized and standardized virtual appliances Public Clouds Private Cloud
Anatomy of an Elastic-R machine instance on Amazon EC2 Restful WS over SSL Restful WS over SSL SOAP over SSL Heartbeat Restful WS over SSL SSH HTTPS
Software+Services=Applications convergence The server-side toolkit: R + spreadsheet models + virtual gui widgets.
Cloud Computing and the Building Blocks of Ubiquitous Collaboration
Elastic-R is a collaborative Virtual Research Environment. Users can share their machine instances, stateful remote engines, data,..
The Elastic-R portal itself is an EC2 machine instance. Any number of portals can be run on EC2 for decentralized and private collaboration Amazon Virtual Private Cloud Subnet 2 Subnet 1 Subnet 3
Cloud Computing and the Building Blocks of Reproducible Research
A scientist can snapshot her computational environment and her data. She can archive the snapshot or share it with others. Elastic-R Amazon Machine Images Elastic-R AMI 1 R 2.10 + BioC 2.5 Elastic-R AMI 2 R 2.9 + BioC 2..3 Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R AMI 3 R 2.8+BioC 2.0 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Elastic-R.org Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R EBS 4 Data Set VVV Elastic-R EBS1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV
A scientist can snapshot her computational environment and her data. She can archive the snapshot or share it with others. Elastic-R Amazon Machine Images Elastic-R AMI 1 R 2.10 + BioC 2.5 Elastic-R AMI 2 R 2.9 + BioC 2..3 Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R AMI 3 R 2.8+BioC 2.0 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Elastic-R.org Elastic-R AMI 2 R 2.9 + BioC 2.3 Elastic-R EBS 4 Data Set VVV Elastic-R EBS1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV
Stateful generated Web Services delivered by snapshottable/archivable virtual appliances T1 getData T2 T3 Login SessionIDassociated with a reserved Elastic-R Engine LogOn Pwd Options ES f ( ES ) ESon2 ESon3 ESon1 Retrieve Data T1,T2,T3 : GeneratedStateful Web Services for R functions T1,T2 & T3 LogOn,getData : R-SOAP methods ES : ExpressionSet ESon1, ESon2, ESon3 : ExpressionSet Object Names f = T3 o T2 o T1 logOff • removeESonx • « Clean » Elastic-R Engine • Put Elastic-R Engine back in the Pool • killElastic-R Engine
Cloud Computing and the Simplification/Standardization of the Scientific Applications’ Life Cycle
Users can create easily Java GUIs that use the full capabilities of a stateful and remote R engine and share them as URLs Elastic-R AJAX Workbench Visual Graphic User Interface Builder Uploadplugin Standalone Application Accessible From a URL Elastic-R Java Workbench • Plugins Repository • myPlugin • myDashboard
Links • Elastic-R Portal : • www.elastic-r.org • Articles about the project: • Chine K. (2010). Open Science in the Cloud: Towards a Universal Platform for Scientific and Statistical Computing. In Handbook of Cloud Computing. (Chapter 19). Springer US. • Karim Chine, "Learning Math and Statistics on the Cloud, Towards an EC2-Based Google Docs-like Portal for Teaching / Learning Collaboratively with R and Scilab," icalt, pp.752-753, 2010 10th IEEE International Conference on Advanced Learning Technologies, 2010 • Karim Chine, "Scientific Computing Environments in the age of virtualization, toward a universal platform for the Cloud" pp. 44-48, 2009 IEEE International Workshop on Open Source Software for Scientific Computation (OSSC), 2009 • Karim Chine, "Biocep, Towards a Federative, Collaborative, User-Centric, Grid-Enabled and Cloud-Ready Computational Open Platform" escience,pp.321-322, 2008 Fourth IEEE International Conference on eScience, 2008 • Linkedin Group: • http://www.linkedin.com/groups?home=&gid=2345405
Acknowledgments ACS: MadiNassiriAmazon: Simone Brunozzi, Deepak Singh AT&T Research Labs: Simon UrbanekATUGE: ImenEssafi, BéchirTourki, IlyesGouja, HatemHachicha, Amine ElleuchAuckland Centre for eResearch: Nick Jones Banca d'Italia: Giuseppe Bruno Bio-IT World: Kevin Davies BNP Paribas: OusseynouNakoulimaCambridge Healthtech Institute: Cindy CrowninshieldCity University of New York: Mario Morales, MakramTalihColumbia University: Omar BesbesDassaultSystèmes: Omri Ben Ayoun, Patrick Johnson Dataspora: Michael E. Driscoll EDF: Alejandro RibesEBI: AlvisBrazma, Wolfgang Huber, KimmoKallio, MishaKapushesky, Michael Kleen, Alberto Labarga, Philippe Rocca-Serra, UgisSarkans, Kirsten Williams, Eamonn Maguire EPFL: Darlene Goldstein ESPRIT: Farouk Kammoun, Tahar. Benlakhdare-Taalim: NadhirDoumaETH Zürich: YohanChalabi, DiethelmWürtz, Martin MächlerEuropean Commission: KonstantinosGlinos, EnricMitjana, Monika Kacik, IoannisSagiasFHCRC: Martin Morgan, Nianhua Li, Seth Falcon Google: Olivier BosquetFVG LLC: Lisa Wood Harvard University: Tim Clark, Sudeshna Das, Douglas Burke,PaoloCiccareseIBM: Jean-Louis Bernaudin, Pascal Sempe, Loic Simon, Lea A Deleris, Alex Fleischer, Alain ChabrierImperial College London: AsifAkram, VasaCurcin, John Darlington, Brian Fuchs Indiana University:MichaelGrobeINRIA: David Monteau, Christian Saguez, Claude Gomez, SylvestreLedruJISC: John Wood, David Flanders Johnson & Johnson - Janssen Pharmaceutica: Patrick MarichalKXEN: Eric MarcadeLancaster University: Robert Crouchley, Daniel GroseLeibniz Universität Hannover: KorneliusRohmeierLIAMA:Baogang Hue, Kang CaiLimagrain: ZivanKaramanMekentosj: Alexander Griekspoor, Matt Wood Microsoft: Eric Le Marois, Tony Hey Mubadala: Ghazi Ben Amor Nature Publishing Group: Ian Mulvany, Steve Scott NCeSS: Peter Halfpenny, Rob Procter, MarziehAsgari-Targhi, Alex Voss, YuWei Lin, Mercedes ArgüelloCasteleiro, Wei Jie, MeikPoschen, Katy Middlebrough, Pascal Ekin, June Finch, FarzanaLatif, Elisa Pieri, Frank O'Donnell New York Java User Group: Frank D Greco OeRC: Dimitrina Spencer, MatteoTurilli, David Wallom, Steven Young OMII-UK: Neil Chue Hong, Steve Brewer OpenAnalytics: Tobias VerbekeOracle: Dominique van Deth, Andrew Bond OSS Watch: Ross GardlerPlatform Computing: Christopher Smith Royal Society: James WilsdonSan Diego Supercomputer Center: Nancy R. Wilkins-DiehrSanger Institute: Lars Jorgensen, Phil Butcher Shell: Wayne.W.Jones, Nigel Smith SociétéGénérale: Anis MaktoufStanford University: John Chambers, BalasubramanianNarasimhan, Gunter Walther SYSTEM@TIC: KarimAzoumTechnischeUniversität Dortmund: UweLigges, Bernd BischlTechnoforge: Pierre-Antoine DurgeatTekiano: Samy Ben NaceurTélécom-ParisTech: Isabelle Demeure, Georges Hebrail, NesrineGabsiThe Generations Network: Jim PorzakTotal: YannickPerigoisTunisian Ministry of Communication Technologies: NaceurAmmar, LamiaChaffai-Sghaier, Mohamed SaïdOuerghi, SyrineTliliTunisian EcolePolytechnique: RiadhRobbanaUC Berkeley: Noureddine El Karoui, Terry Speed UC Davis: Rudy Beran, Debashis Paul, Duncan Temple Lang UCL: Daniel JeffaresUCLA: IvoDinov, JeroenOomsUC San Diego: Anthony GamstUCSF: Tena Sakai UniversitéCatholique de Louvain: Christian Ritter University of Cambridge: Ian Roberts, Robert MacInnis Peter Murray-Rust, Jim Downing University of Manchester: Carole Goble, Len Gill, Simon Peters, Richard D Pearson, Iain Buchan, John Ainsworth University of Plymouth: Paul HewsonUniversity of Split: IvicaPuljakUTK: Ajay OhriWorld Bank Group-IFC: OualidAmmarYahoo: Laurent Mirguet, Rob WeltmanIndependant:Charles Dallas, Romain François