190 likes | 340 Views
Using Amazon EC2 for ocean-atmosphere modeling and teaching. Chris Hill, MIT Sept 17 2009 + Constantinos Evangelinos , Glenn Flierl. Background. Interests in cloud Engaging audiences that do not normally use geoscience modeling and simulation tools - especially Educational/classroom
E N D
Using Amazon EC2 for ocean-atmosphere modeling and teaching. Chris Hill, MIT Sept 17 2009 + ConstantinosEvangelinos, Glenn Flierl Chris Hill, cnh@mit.edu
Background • Interests in cloud • Engaging audiences that do not normally use geoscience modeling and simulation tools - especially • Educational/classroom • non-computational science, sporadic users of HPC models and HPC model output comfortable with laptop, desktop, handheld devices etc… more “ambivalent” toward traditional HPC simulation Chris Hill, cnh@mit.edu
Attraction of cloud (from my perspective) • Big attraction is the “model” of • (i) fire up notebook, desktop. (ii) autheticate. (iii) select an image and begin running simulation (with low barriers to entry). • first interest in cloud originated in experience with Amazon EC2 roughly two years ago • For application developer the Amazon approach to the cloud (Xen based, virtual machine) provides a powerful system for “packaging” environments – does away with “installation” guides? • For user it has potential to evolve to on-demand, responsive, feels like its on my desktop capability. • For both users and developers, somebody else deals with the capital funding and managing the infrastructure (space, power, equipment). Users and developers just rent – good for occasional (but significant) use, although not economical for full 24x7 cluster replacement. Installing OpenAD Installing ESMF Installing MITgcm Chris Hill, cnh@mit.edu
Todays talk • Examples of using Amazon EC2 “cloud” system for atmos/ocean apps. • Efforts to develop and demonstrate applications/middleware for • deploying models rooted in state-of-the-art HPC geoscience on cloud • employing models on cloud in classroom • Describe tools and approaches and some performance (which is not great – but also not catastrophic!). • Feel free to ask questions (of course) Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • First experiments looked at a system for running coarse resolution atmosphere/ocean experiment • EC2 interesting for ensembles of experiments (generating distributions, pdf’s). • Testbed for exploring OA cloud app that uses MPI (6-12 processes), some user interaction from desktop, accessing output etc… • Testbed for exploring packaging as an image Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • Numerical experiment description • 3d atmosphere and ocean • Idealized geometry • Vehicle for testing theories in isolation. • ~3.5 degree resolution (cube-sphere grid) – very coarse. • Experiments often involve long spin-up (thousands of years of simulation) + perturbations around equilibrated state. • Computational • Separate parallel atmos and ocean executables + a coordinating coupler. • Our tests used 13 procs, 6+6+1, MPMD style. • Same core code engine (http://mitgcm.org) as many other apps Ferreira Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • Ingredients for EC2 test – multiple options for many of these. • Way to bring up EC2 “cluster” suitable for MPI and fluid app. • Shared file system. EC2 nodes don’t know about each other by default. • Various libraries – MPI, netcdf etc… • Way to launch runs and get feedback. • Comparing ways to do each of these http://www.cca08.org Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • Compute performance • Standard instances (m1.small,m1.large,m1.xlarge), high-cpu instances (c1.medium,c1.xlarge). Comparing memory bandwidth (stream) for different instances v. physical machine (right). m1 instances are a better deal for memory bound than c1. Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • MPI performance • Examining different MPI’s on EC2 OpenMPI a bit of an outlier (due to spin-waiting, rather than yield?) Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • File systems • Examining different instances NFS Fortran I/O (NAS NPB 3.3 BT-IO) benefits from more powerful node e.g c1.medium > m1.small. Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • User interaction • for some of target audience command line is not engaging. • sshfs and a XML driven desktop client are used to • Expose EC2 cluster file system on users desktop • Configure and launch experiments GUI driven by XML Image instantiation in cloud (EC2, Euc, Nim). SSHFS Desktop user Image including XML description. Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean • Summary of this part • Cost effectiveness will be a function of getting right combination of instance, library, cluster configuration. • Sweet spot for this app involves mix of m1.small (compute) and c1.medium (head node and file server). • The instance attributes are all virtual so could change unexpectedly. • Absolute performance is below gigabit ethernet cluster, but not by an order of magnitude. • Added convenience of sshfs based interaction provides a nice “on my desktop” quality (similar to other “engaging supercomputing” projects, llgrid, Interactive Supercomputing) Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean Cost analysis as of 2008 Chris Hill, cnh@mit.edu
EC2 for atmosphere/ocean – next steps • Application to ensemble ocean state estimation problem (with ConstantinosEvangelinos and Pierre Lemesiaux). • Interesting example of on demand, “sporadic” ensemble facility requirement. • Involves large (600+ member) ensembles of ocean model runs, running in real-time, but for this project only for a few days, once per year (to coincide with field deployments). • Preliminary analysis suggests EC2 can provide needed compute power and is arguably cost-effective. Chris Hill, cnh@mit.edu
Toward a more general application – targeted for classroom. MIT OCW and others have put static course content online. Scales to >100 million users, truly global. How to include state-of-the-art research models in courses? Leadportal has demonstrated a vision of putting dynamic, modeling online. How to scale dynamic, online modeling to millions of users? NSF STCI project exploring how cloud-computing might play a role in addressing these goals. Chris Hill, cnh@mit.edu
Toward a more general application – targeted for classroom. Starting point - but some parts not general (setup specific to MITgcm), viz support not provided. Next steps to target some basic and high-end simulation in MIT courses. 12.804 – mix of Octave, Fortran and C code modules are used. Only 1d and 2d problems can be done (due to compute resources). Real world and lab cases in course are 3d (obviously). Goal – Use EC2/cloud to support course and extend to 3d. Use a test platform for more general toolkit. Chris Hill, cnh@mit.edu
Toward a more general application – targeted for classroom. First experiments 1 – extending communication channel to include VNC. 2 – explore use of http, javascript for front-end. 3 – formalizing boundary between educationalist provided simulation module and support layer. 4 – extend to support different modes of use Level 0 – all cloud. Minimal software on client. Level 1 – cloud compute, viz client Level 2 – cloud serves but client can compute. Browser Interface +VNC Chris Hill, cnh@mit.edu
Toward a more general application – targeted for classroom. In theory Level 0 design supports 1 – Any scriptable viz (IDV, Octave+GNU Plot) etc…. interactive viz can be done, but response will be painfully slow. 2 – Any modern browser (need Java + SSH). Once have these user just points at URL in cloud. 3 – Many/most researcher simulation codes (assuming any legal issues are resolved). Level 1 design 1 – May allow better support of more interactive viz Tests so far good…. It is working with a few test students. Aesthetics need some work! Potential to exploit large systems in classroom is promising. Browser Interface +VNC Chris Hill, cnh@mit.edu
Questions (and answers)? Teragrid should add node(s) for cloud based “engaging” supercomputing of sort envisaged in this talk, in projects like LEAD and in other talks at this meeting – still cheaper to build than buy.Geosciences is a great discipline to be a partner in this. Symbiotic developments in (i) high-speed networks, (ii) virtualization and (iii) cloud technologies mean that this is an opportune moment to do this. And …. if anyone is interested MIT has an economically worthwhile,ARRA appropriate, shovel ready, energy and carbon efficient brown-field site plan we (I) think is perfect for some part of such an activity! Chris Hill, cnh@mit.edu