700 likes | 849 Views
Computational Science & Engineering Simulation at Large Scales. Forum ORAP Paris, 27 March 2003 David E. Keyes Center for Computational Science Old Dominion University & Institute for Scientific Computing Research Lawrence Livermore National Laboratory. Remarks. This talk is:
E N D
Computational Science & Engineering Simulation at Large Scales Forum ORAP Paris, 27 March 2003 David E. Keyes Center for Computational Science Old Dominion University & Institute for Scientific Computing Research Lawrence Livermore National Laboratory
Remarks • This talk is: • personal perspective, not an official statement of the U. S. Department of Energy • a project panorama more than a technical presentation • For related technical presentations, see: • personal homepage on the web (www.math.odu.edu/~keyes) • project homepage on the web (www.tops-scidac.org) • Presumed common (France/USA) interests in computational science: • common machines, common scientific challenges, common workforce development challenges • future collaboration in ITER fusion experiment and other areas
Remarks, continued • Recently, use of software developed in the US DOE has grown in France (and in other countries with large-scale parallel machines) • Portable Extensible Toolkit for Scientific Computing (PETSc) • High-performance Preconditioner library (Hypre) • Historically, French/USA collaborations on parallel implicit solver research is rich • Bernardi, Garbey, Giraud, Glowinski, Lantéri, LeTallec, Maday, Meurant, Nataf, Périaux, Pironneau, Roux, Tidriri, Tromeur-Dervoux, … many others, have influenced development of codes we use today in the DOE • Transfer of scientific ideas is more effective with people, than with papers and software alone
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Environment global climate contaminant transport Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Experiments dangerous Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineering crash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics ITER: $20B Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Experiments expensive Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Experiments expensive Scientific Simulation However, simulation is far from proven! To meet expectations, we need to handle problems of multiple physical scales.
“Enabling technologies” groups to develop reusable software and partner with application groups • Since start-up in 2001, 51 projects share $57M per year • Approximately one-third for applications • A third for “integrated software infrastructure centers” • A third for grid infrastructure and collaboratories • Plus, two new 5 Tflop/s IBM SP machines available for SciDAC researchers
SciDAC project characteristics • Affirmation of importance of simulation • for new scientific discovery, not just for “fitting” experiments • Recognition that leading-edge simulation is interdisciplinary • no support for physicists and chemists to write their own software infrastructure; must collaborate with math & CS experts • Commitment to distributed hierarchical memory computers • new code must target this architecture type • Requirement of lab-university collaborations • complementary strengths in simulation • 13 laboratories and 50 universities in first round of projects
Pacific Northwest Brookhaven Argonne Lawrence Berkeley Lawrence Livermore Sandia Livermore Los Alamos Oak Ridge Sandia DOE Science Lab DOE Defense Lab Major DOE labs Old Dominion University
Pacific Northwest Brookhaven Argonne Lawrence Berkeley Lawrence Livermore Sandia Livermore Los Alamos Oak Ridge Sandia DOE Science Lab DOE Defense Lab Major DOE labs Columbia University
Large platforms provided for ASCI • ASCI roadmap is to go to 100 Teraflop/s by 2006 • Use variety of vendors • Compaq • Cray • Intel • IBM • SGI • Rely on commodity processor/memory units, with tightly coupled network • Massive software project to rewrite physics codes for distributed shared memory
…and now for SciDAC • IBM Power3+ SMP • 16 procs per node • 208 nodes • 24 Gflop/s per node • 5 Tflop/s (upgraded to 10, Feb 2003) Berkeley • IBM Power4 Regatta • 32 procs per node • 24 nodes • 166 Gflop/s per node • 4Tflop/s (10 in 2003) Oak Ridge
New architecture on horizon: QCDOC • System-on-a-chip architecture • Designed for Columbia University and Brookhaven National Lab by IBM using Power technology • Special purpose machine for Lattice Gauge Theory Quantum Chromodynamics • “very fast conjugate gradient machine with small local memory” • 10 Tflop/s total, copies ordered for UK, Japan QCD research groups To be delivered August 2003
New architecture on horizon: Blue Gene/L • 180 Tflop/s configuration (65536 dual processor chips) • Closely related to QCDOC prototype (IBM system-on a chip) • Ordered for LLNL institutional computing (not ASCI) To be delivered 2004
New architecture just arrived: Cray X1 • Massively parallel-vector machine highly desired by global climate simulation community • 32-processor prototype ordered for evaluation • Scale-up to 100 Tflop/s peak planned, if prototype proves successful Delivered to ORNL 18 March 2003
“horizontal” aspects “vertical” aspects network latency, BW, diameter memory latency, BW; L/S (cache/reg) BW “Boundary conditions” from architecture Algorithms must run on physically distributed memory units connected by message-passing network, each serving one or more processors with multiple levels of cache
Following the platforms … • … Algorithms must be • highly concurrent and straightforward to load balance • not communication bound • cache friendly (temporal and spatial locality of reference) • highly scalable (in the sense of convergence) • Goal for algorithmic scalability: fill up memory of arbitrarily large machines while preserving nearly constant* running times with respect to proportionally smaller problem on one processor *logarithmically growing
Official SciDAC goals • “Create a new generation of scientific simulation codes that take full advantage of the extraordinary computing capabilities of terascale computers.” • “Create the mathematical and systems software to enable the scientific simulation codes to effectively and efficiently use terascale computers.” • “Create a collaboratory software environment to enable geographically separated scientists to effectively work together as a team and to facilitate remote access to both facilities and data.”
Four science programs involved … “14 projects will advance the science of climate simulation and prediction. These projects involve novel methods and computationally efficient approaches for simulating components of theclimate system and work on an integrated climate model.” “10 projects will address quantum chemistry and fluid dynamics, for modeling energy-related chemical transformations such as combustion,catalysis, and photochemical energy conversion. The goal of these projectsis efficient computational algorithms to predict complex molecularstructures and reaction rates with unprecedented accuracy.”
Four science programs involved … “4 projects in high energy and nuclearphysics will explore the fundamental processes of nature. The projectsinclude the search for the explosion mechanism of core-collapsesupernovae, development of a new generation of accelerator simulationcodes, and simulations of quantum chromodynamics.” “5 projects are focused on developing and improving the physics models needed forintegrated simulations of plasma systems to advance fusion energy science.These projects will focus on such fundamental phenomena as electromagneticwave-plasma interactions, plasma turbulence, and macroscopic stability ofmagnetically confined plasmas.”
SciDAC per year portfolio: $57M for Math, Information and Computer Sciences
Data grids and collaboratories • National data grids • Particle physics grid • Earth system grid • Plasma physics for magnetic fusion • DOE Science Grid • Middleware • Security and policy for group collaboration • Middleware technology for science portals • Network research • Bandwidth estimation, measurement methodologies and application • Optimizing performance of distributed applications • Edge-based traffic processing • Enabling technology for wide-area data intensive applications
Computer Science ISICs • Scalable Systems Software Provide software tools for management and utilization of terascale resources. • High-end Computer System Performance: Science and Engineering Develop a science of performance prediction based on concepts of program signatures, machine signatures, detailed profiling, and performance simulation and apply to complex DOE applications. Develop tools that assist users to engineer better performance. • Scientific Data Management Provide a framework for efficient management and data mining of large, heterogeneous, distributed data sets. • Component Technology for Terascale Software Develop software component technology for high-performance parallel scientific codes, promoting reuse and interoperability of complex software, and assist application groups to incorporate component technology into their high-value codes.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Exciting time for enabling technologies SciDAC application groups have been chartered to build new and improved COMMUNITY CODES. Such codes, such as NWCHEM, consume hundreds of person-years of development, run at hundreds of installations, are given large fractions of community compute resources for decades, and acquire an “authority” that can enable or limit what is done and accepted as science in their respective communities. Except at the beginning, it is difficult to promote major algorithmic ideas in such codes, since change is expensive and sometimes resisted. ISIC groups have a chance, due to the interdependence built into the SciDAC program structure, to simultaneously influence many of these codes, by delivering software incorporating optimal algorithms that may be reused across many applications. Improvements driven by one application will be available to all. While they are building community codes, this is our chance to build a CODE COMMUNITY!
SciDAC themes • Chance to do community codes “right” • Meant to set “new paradigm” for other DOE programs • new 2003 nano science modeling initiative • possible new 2004 fusion simulation initiative • Cultural barriers to interdisciplinary research acknowledged up front • Accountabilities constructed in order to force the mixing of “scientific cultures” (physicists/biologists/chemists/engineers with mathematicians/computer scientists)
What’s new in SciDAC library software? • Philosophy of library usage • large codes interacting as peer applications, with complex calling patterns (e.g., physics code calls implicit solver code calls subroutine automatically generated from original physics code to supply Jacobian of physics code residual) • extensibility • polyalgorithmic adaptivity • Resources for development, long-term maintenance, and support • not just for “dissertation scope” ideas • Experience on terascale scale computers
Introducing “Terascale Optimal PDE Simulations” (TOPS) ISIC Nine institutions, $17M, five years, 24 co-PIs
adaptive gridding, discretization solvers systems software, component architecture, performance engineering, data management 34 apps groups (BER, BES,FES, HENP) 7 ISIC groups (4 CS, 3 Math) software integration 10 grid, data collaboratory groups performance optimization
Who we are… … the PETSc and TAO people … the Hypre and Sundials people … the SuperLU and PARPACK people … as well as the builders of other widely used packages …
Plus some university collaborators Demmel et al. Manteuffel et al. Dongarra et al. Widlund et al. Ghattas et al. Keyes et al. Our DOE lab collaborations predate SciDAC by many years.
You may know the on-line “Templates” guides … www.netlib.org/templates www.netlib.org/etemplates 124 pp. 410 pp. … these are good starts, but not adequate for SciDAC scales!
… SciDAC puts some of the authors (and many others) “on-line” for physics groups You may know the on-line “Templates” guides … www.netlib.org/templates www.netlib.org/etemplates 124 pp. 410 pp.
Optimizer Sens. Analyzer Time integrator Nonlinear solver Eigensolver Linear solver Indicates dependence Scope for TOPS • Design and implementation of “solvers” • Time integrators • Nonlinear solvers • Optimizers • Linear solvers • Eigensolvers • Software integration • Performance optimization (w/ sens. anal.) (w/ sens. anal.)
64 64 2u=f 64 * *On a 16 Mflop/s machine, six-months is reduced to 1 s The power of optimal algorithms • Advances in algorithmic efficiency rival advances in hardware architecture • Consider Poisson’s equation on a cube of size N=n3 • If n=64, this implies an overall reduction in flops of ~16 million
relative speedup year Algorithms and Moore’s Law • This advance took place over a span of about 36 years, or 24 doubling times for Moore’s Law • 22416 million the same as the factor from algorithms alone!
AMG Framework error damped by pointwise relaxation Choose coarse grids, transfer operators, etc. to eliminate, based on numerical weights, heuristics The power of optimal algorithms • Since O(N) is already optimal, there is nowhere further “upward” to go in efficiency, but one must extend optimality “outward”, to more general problems • Hence, for instance, algebraic multigrid (AMG), obtaining O(N) in anisotropic, inhomogeneous problems algebraically smooth error
SciDAC application: Center for Extended Magnetohydrodynamic Modeling Simulate plasmas in tokomaks, leading to understanding of plasma instability and (ultimately) new energy sources Joint work between ODU, Argonne, LLNL, and PPPL
iters 200 unscalable 150 Time to Solution 100 50 procs scalable 0 time 1000 1 10 100 Problem Size (increasing with number of processors) Optimal solvers • Convergence rate nearly independent of discretization parameters • Multilevel schemes for linear and nonlinear problems • Newton-like schemes for quadratic convergence of nonlinear problems AMG shows perfect iteration scaling, above, in contrast to ASM, but still needs performance work to achieve temporal scaling, below, on CEMM fusion code, M3D, though time is halved (or better) for large runs (all runs: 4K dofs per processor)
Solver interoperability accomplishments • Hypre in PETSc • codes with PETSc interface (like CEMM’s M3D) can invoke Hypre routines as solvers or preconditioners with command-line switch • SuperLU_DIST in PETSc • as above, with SuperLU_DIST • Hypre in AMR Chombo code • so far, Hypre is level-solver only; its AMG will ultimately be useful as a bottom-solver, since it can be coarsened indefinitely without attention to loss of nested geometric structure; also FAC is being developed for AMR uses, like Chombo
Background of PETSc Library • Developed by at Argonne to support research, prototyping, and production parallel solutions of operator equations in message-passing environments; now joined by four additional staff under SciDAC • Distributed data structures as fundamental objects - index sets, vectors/gridfunctions, and matrices/arrays • Iterative linear and nonlinear solvers, combinable modularly and recursively, and extensibly • Portable, and callable from C, C++, Fortran • Uniform high-level API, with multi-layered entry • Aggressively optimized: copies minimized, communication aggregated and overlapped, caches and registers reused, memory chunks preallocated, inspector-executor model for repetitive tasks (e.g., gather/scatter) Seehttp://www.mcs.anl.gov/petsc
User Code/PETSc Library Interactions Main Routine Timestepping Solvers (TS) Nonlinear Solvers (SNES) Linear Solvers (SLES) PETSc PC KSP Application Initialization Function Evaluation Jacobian Evaluation Post- Processing User code PETSc code
User Code/PETSc Library Interactions Main Routine Timestepping Solvers (TS) Nonlinear Solvers (SNES) Linear Solvers (SLES) PETSc PC KSP Application Initialization Function Evaluation Jacobian Evaluation Post- Processing User code PETSc code To be AD code
Background of Hypre Library(to be combined with PETSc under SciDAC) • Developed by Livermore to support research, prototyping, and production parallel solutions of operator equations in message-passing environments; now joined by seven additional staff under ASCI and SciDAC • Object-oriented design similar to PETSc • Concentrates on linear problems only • Richer in preconditioners than PETSc, with focus on algebraic multigrid • Includes other preconditioners, including sparse approximate inverse (Parasails) and parallel ILU (Euclid) Seehttp://www.llnl.gov/CASC/hypre/