Developing HPC Scientific and Engineering Applications: From the Laptop to the Grid

Developing HPC Scientific and Engineering Applications: From the Laptop to the Grid Gabrielle Allen, Tom Goodale, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational Physics, Germany John Shalf Lawrence Berkeley Laboratory, USA These slides: http://www.cactuscode.org/Tutorials.html

Outline for the Day • Introduction (Ed Seidel, 30 min) • Issues for HPC (John Shalf, 60 min) • Cactus Code (Gabrielle Allen, 90 min) • Demo: Cactus, IO and Viz (John, 15 min) • LUNCH • Introduction to Grid Computing (Ed, 15 min) • Grid Scenarios for Applications (Ed, 60 min) • Demo: Grid Tools (John, 15 min) • Developing Grid Applications Today (Tom Goodale, 60 min) • Conclusions (Ed, 5 min) www.cactuscode.org

Introduction

Outline • Review of application domains requiring HPC • Access and availability of computing resources • Requirements from end users • Requirements from application developers • The future of HPC www.cactuscode.org

What Do We Want to Achieve? • Overview of HPC Applications and Techniques • Strategies for developing HPC applications to be: • Portable: from Laptop to Grid • Future proof • Grid ready • Introduce Frameworks for HPC Application development • Introduce the Grid: What is/isn’t it? What will be? • Grid Toolkits: How to prepare/develop apps for Grid, today & tomorrow • What are we NOT doing? • Application specific algorithms • Parallel programming • Optimizing Fortran, etc www.cactuscode.org

Who uses HPC? • Scientists and Engineers • Simulating Nature: Black Hole Collisions, Hurricanes, Ground water flow • Modeling processes: space shuttle entering atmosphere • Analyzing data: lots of it! • Financial Markets • Modeling currencies • Industry • Airlines, insurance companies • Transaction, data, etc • All face similar problems • Computational need not met • Remote facilities • Heterogeneous and changing systems • Look now at three types: High-Capacity, Throughput, Data Computing www.cactuscode.org

Teraflop Computation, AMR, Elliptic-Hyperbolic, ??? Perturbative Numerical Relativity High Capacity Computing: Want to Compute What Happens in Nature! www.cactuscode.org

t=100 Computation Needs: 3D Numerical Relativity t=0 • Get physicists + CS people together • Find Resource (TByte, TFlop crucial) • Initial Data: 4 coupled nonlin. elliptics • Choose Gauge (elliptic/hyperbolic…) • Evolution • “hyperbolic” evolution • coupled with elliptic eqs. • Find Resource …. • Analysis: Interpret, Find AH, etc www.cactuscode.org

Any Such Computation Requires Incredible Mix of Varied Technologies and Expertise! • Many Scientific/Engineering Components Physics, astrophysics, CFD, engineering,... • Many Numerical Algorithm Components • Finite difference methods? Finite elements? • Elliptic equations: multigrid, Krylov subspace, preconditioners,... • Mesh Refinement? • Many Different Computational Components • Parallelism (HPF, MPI, PVM, ???) • Architecture Efficiency (MPP, DSM, Vector, PC Clusters, ???) • I/O Bottlenecks (generate gigabytes per simulation, checkpointing…) • Visualization of all that comes out! • Scientist/eng. wants to focus on top, but all required for results... • Such work cuts across many disciplines, areas of CS… • And now do it on a Grid??!! www.cactuscode.org

How to Achieve This? Any Such Computation Requires Incredible Mix of Varied Technologies and Expertise! • Many Scientific/Engineering Components Physics, astrophysics, CFD, engineering,... • Many Numerical Algorithm Components • Finite difference methods? Finite elements? • Elliptic equations: multigrid, Krylov subspace, preconditioners,... • Mesh Refinement? • Many Different Computational Components • Parallelism (HPF, MPI, PVM, ???) • Architecture Efficiency (MPP, DSM, Vector, PC Clusters, ???) • I/O Bottlenecks (generate gigabytes per simulation, checkpointing…) • Visualization of all that comes out! • Scientist/eng. wants to focus on top, but all required for results... • Such work cuts across many disciplines, areas of CS… • And now do it on a Grid??!! www.cactuscode.org

High Throughput Computing: Task farming • Running hundreds - millions ++ of jobs as quickly as possible • Collecting statistics, doing ensemble calculations, surveying large parameter space, etc • Typical Characteristics • Many small, independent jobs: must be managed! • Usually not much data transfer • Sometimes jobs can be moved from site to site • Example Problems: climatemodeling.com, NUG30 • Example Solutions: Condor, SC02 demos, etc • Later: examples that combine “capacity” and “throughput” www.cactuscode.org

Large Data Computing • Data: more and more the “killer app” for the Grid • Data mining: • Looking for patterns in huge databases distributed over the world • E.g. Genome analysis • Data analysis: • Large astronomical observatories • Particle physics experiments • Huge amounts of data from different locations to be correlated, studied • Data generation • Resources Grow: Huge simulations will each generate TB-PB to be studied • Visualization • How to visualize such large data, here, at a distance, distributed • Soon: Dynamic combinations of all types of computing, data & on grids • Our Goal is to give strategies for dealing with all types of computing www.cactuscode.org

NASA Neutron Star Grand Challenge • 5 US Institutions • Solve problem of colliding neutron stars (try…) • NSF Black Hole Grand Challenge • 8 US Institutions, 5 years • Solve problem of colliding BH (try…) • EU Network Astrophysics • 10 EU Institutions, 3 years, €1.5M • Continue these problems • Entire Community becoming Grid enabled Grand Challenge CollaborationsGoing Large Scale: Needs Dwarf Capabilities • Examples ofFuture of Science & Engineering • Require Large Scale Simulations, beyond reach of any machine • Require Large Geo-distributed Cross-Disciplinary Collaborations • Require Grid Technologies, but not yet using them! • Both Apps and Grids Dynamic… www.cactuscode.org

Growth of Computing Resources (from Dongarra) www.cactuscode.org

Not just Growth, Proliferation • Systems getting larger by 2-3-4x per year! • Moore’s law (processor doubles each 18 months) • Increasing parallelism: add more and more processors • More systems • Many more organizations recognizing need for HPC • Universities • Labs • Industry • Business • New kind of parallelism: Grid • Harness these machines, which themselves are growing • Machines all different! Be prepared for next thing… www.cactuscode.org

Today’s Computational Resources • PDA’s • Laptops • PCs • SMPs • Shared memory up to now • Clusters • Distributed memory, must use message passing or task farming • “Traditional” supercomputers • SMPs of up to ~64+ processors • Clustering above this • Vectors • Clusters of large systems: metacomputing • The Grid • Everyone: uses PDAs - PCs • Industry: prefers traditional machines • Academia: clusters for price/perf • We show how to minimize effort to go • between systems, prepare for Grid www.cactuscode.org

The Same Application … Laptop Super Computer The Grid Application Application Application Middleware Middleware Middleware No network! Biggest machines! www.cactuscode.org

What is Difficult About HPC? • Many different architectures and operating systems • Things change very rapidly • Must worry about many things at same time • Single processor performance, caches, etc • Different languages (but now, at least everything is (nearly) unix!) • Parallelism • I/O • Visualization • Batch systems • Portability: compilers, datatypes and associated tools www.cactuscode.org

Requirements of End Users • We have problems that need to be solved • Want to work at conceptual level • Build on top of other things that have been solved for us • Use libraries, modules, etc. • We don’t want to waste time with… • Learning a new parallel layer • Writing high performance I/O • Learning a new batch system, etc… • We have collaborators distributed all over the world • We want answers fast, on whatever machines are available • Basically, want to write simple Fortran or C code and have it work… www.cactuscode.org

Requirements of Application Developers • We must have access to latest technologies • These should be available through simple interfaces and APIs • They should be interchangeable with each other when same functionality is available from different packages • Code we develop must be as portable and as future proof as possible • Run on all these architectures we have today • Easily adapted to those of tomorrow • If possible, top level user app code should not change, only layers underneath • We’ll give strategies for doing this, on today’s machines, and on the Grid of tomorrow www.cactuscode.org

Where is This All Going? • Dangerous to predict, but: • Resources will continue to grow for some time • Machines will get larger at this rate: TF now, PF tomorrow • Collections of resources into Grids is happening now, will be routine tomorrow • Very hetergenous environments • Data explosion will be exponential • Mixture of realtime simulation and data analysis will become routine • Bandwidth from point to point will allocatable on demand! • Applications will become very sophisticated, able to adapt to their changing needs, and to changing environment (on time scales of minutes to years) • We are trying today to help you prepare for this! www.cactuscode.org

Developing HPC Scientific and Engineering Applications: From the Laptop to the Grid