650 likes | 802 Views
Scientific Discovery through Advanced Computing (SciDAC). The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University & Institute for Scientific Computing Research Lawrence Livermore National Laboratory. Happy G ödel ’s Birthday!.
E N D
Scientific Discovery through Advanced Computing (SciDAC) The Pennsylvania State University 28 April 2003 David E. Keyes Center for Computational Science Old Dominion University & Institute for Scientific Computing Research Lawrence Livermore National Laboratory
Happy Gödel’s Birthday! • Born:28 April 1906, Brünn, Austria-Hungary • Published “Incompleteness Theorem”, 1931 • Fellow, Royal Society, 1968 • National Medal of Science, 1974 • Died: 14 January 1978, Princeton, NJ • “Gave a formal demonstration of the inadequacy of formal demonstrations”- anon. “A consistency proof for any system … can be carried out only by modes of inference that are not formalized in the system … itself.”. – Kurt Gödel
Remarks • This talk is: • a personal perspective, not an official statement of the U.S. Department of Energy • a project panorama more than a technical presentation • For related technical presentations: • Tuesday 2:30pm, 116 McAllister Building • personal homepage on the web (www.math.odu.edu/~keyes) • SciDAC project homepage on the web (www.tops-scidac.org)
Computational Science & Engineering • A “multidiscipline” on the verge of full bloom • Envisioned by Von Neumann and others in the 1940’s • Undergirded by theory (numerical analysis) for the past fifty years • Empowered by spectacular advances in computer architecture over the last twenty years • Enabled by powerful programming paradigms in the last decade • Adopted in industrial and government applications • Boeing 777’s computational design a renowned milestone • DOE NNSA’s “ASCI” (motivated by CTBT) • DOE SC’s “SciDAC” (motivated by Kyoto, etc.)
Niche for computational science • Has theoretical aspects (modeling) • Has experimental aspects (simulation) • Unifies theory and experiment by providing common immersive environment for interacting with multiple data sets of different sources • Provides “universal” tools, both hardware and software Telescopes are for astronomers, microarray analyzers are for biologists, spectrometers are for chemists, and accelerators are for physicists, but computers are for everyone! • Costs going down, capabilities going up every year
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Environment global climate contaminant transport Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Experiments dangerous Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineering crash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics ITER: $20B Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Experiments expensive Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Experiments expensive Scientific Simulation However, simulation is far from proven! To meet expectations, we need to handle problems of multiple physical scales.
“Enabling technologies” groups to develop reusable software and partner with application groups • Since start-up in 2001, 51 projects share $57M per year • Approximately one-third for applications • A third for “integrated software infrastructure centers” • A third for grid infrastructure and collaboratories • Plus, two new ~10 Tflop/s IBM SP machines available for SciDAC researchers
SciDAC project characteristics • Affirmation of importance of simulation • for new scientific discovery, not just for “fitting” experiments • Recognition that leading-edge simulation is interdisciplinary • no independent support for physicists and chemists to write their own software infrastructure; must collaborate with math & CS experts • Commitment to distributed hierarchical memory computers • new code must target this architecture type • Requirement of lab-university collaborations • complementary strengths in simulation • 13 laboratories and 50 universities in first round of projects
Pacific Northwest Brookhaven Argonne Lawrence Berkeley Lawrence Livermore Sandia Livermore Los Alamos Oak Ridge Sandia DOE Science Lab DOE Defense Lab Major DOE labs Old Dominion University
Large platforms provided for ASCI • ASCI roadmap is to go to 100 Teraflop/s by 2006 • Use variety of vendors • Compaq • Cray • Intel • IBM • SGI • Rely on commodity processor/memory units, with tightly coupled network • Massive software project to rewrite physics codes for distributed shared memory
…and now for SciDAC • IBM Power3+ SMP • 16 procs per node • 208 nodes • 24 Gflop/s per node • 5 Tflop/s (upgraded to 10, Feb 2003) Berkeley • IBM Power4 Regatta • 32 procs per node • 24 nodes • 166 Gflop/s per node • 4Tflop/s (10 in 2003) Oak Ridge
New architecture on horizon: QCDOC • System-on-a-chip architecture • Designed for Columbia University and Brookhaven National Lab by IBM using Power technology • Special purpose machine for Lattice Gauge Theory Quantum Chromodynamics • “very fast conjugate gradient machine with small local memory” • 10 Tflop/s total, copies ordered for UK, Japan QCD research groups To be delivered August 2003
New architecture on horizon: Blue Gene/L • 180 Tflop/s configuration (65536 dual processor chips) • Closely related to QCDOC prototype (IBM system-on a chip) • Ordered for LLNL institutional computing (not ASCI) To be delivered 2004
New architecture just arrived: Cray X1 • Massively parallel-vector machine highly desired by global climate simulation community • 32-processor prototype ordered for evaluation • Scale-up to 100 Tflop/s peak planned, if prototype proves successful Delivered to ORNL 18 March 2003
“horizontal” aspects “vertical” aspects network latency, BW, diameter memory latency, BW; L/S (cache/reg) BW “Boundary conditions” from architecture Algorithms must run on physically distributed memory units connected by message-passing network, each serving one or more processors with multiple levels of cache
Following the platforms … • … Algorithms must be • highly concurrent and straightforward to load balance • not communication bound • cache friendly (temporal and spatial locality of reference) • highly scalable (in the sense of convergence) • Goal for algorithmic scalability: fill up memory of arbitrarily large machines while preserving nearly constant* running times with respect to proportionally smaller problem on one processor *logarithmically growing
Official SciDAC goals • “Create a new generation of scientific simulation codes that take full advantage of the extraordinary computing capabilities of terascale computers.” • “Create the mathematical and systems software to enable the scientific simulation codes to effectively and efficiently use terascale computers.” • “Create a collaboratory software environment to enable geographically separated scientists to effectively work together as a team and to facilitate remote access to both facilities and data.”
Four science programs involved … “14 projects will advance the science of climate simulation and prediction. These projects involve novel methods and computationally efficient approaches for simulating components of theclimate system and work on an integrated climate model.” “10 projects will address quantum chemistry and fluid dynamics, for modeling energy-related chemical transformations such as combustion,catalysis, and photochemical energy conversion. The goal of these projectsis efficient computational algorithms to predict complex molecularstructures and reaction rates with unprecedented accuracy.”
Four science programs involved … “4 projects in high energy and nuclearphysics will explore the fundamental processes of nature. The projectsinclude the search for the explosion mechanism of core-collapsesupernovae, development of a new generation of accelerator simulationcodes, and simulations of quantum chromodynamics.” “5 projects are focused on developing and improving the physics models needed forintegrated simulations of plasma systems to advance fusion energy science.These projects will focus on such fundamental phenomena as electromagneticwave-plasma interactions, plasma turbulence, and macroscopic stability ofmagnetically confined plasmas.”
SciDAC per year portfolio: $57M for Math, Information and Computer Sciences
Data grids and collaboratories • National data grids • Particle physics grid • Earth system grid • Plasma physics for magnetic fusion • DOE Science Grid • Middleware • Security and policy for group collaboration • Middleware technology for science portals • Network research • Bandwidth estimation, measurement methodologies and application • Optimizing performance of distributed applications • Edge-based traffic processing • Enabling technology for wide-area data intensive applications
Computer Science ISICs • Scalable Systems Software Provide software tools for management and utilization of terascale resources. • High-end Computer System Performance: Science and Engineering Develop a science of performance prediction based on concepts of program signatures, machine signatures, detailed profiling, and performance simulation and apply to complex DOE applications. Develop tools that assist users to engineer better performance. • Scientific Data Management Provide a framework for efficient management and data mining of large, heterogeneous, distributed data sets. • Component Technology for Terascale Software Develop software component technology for high-performance parallel scientific codes, promoting reuse and interoperability of complex software, and assist application groups to incorporate component technology into their high-value codes.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Exciting time for enabling technologies SciDAC application groups have been chartered to build new and improved COMMUNITY CODES. Such codes, such as NWCHEM, consume hundreds of person-years of development, run at hundreds of installations, are given large fractions of community compute resources for decades, and acquire an “authority” that can enable or limit what is done and accepted as science in their respective communities. Except at the beginning, it is difficult to promote major algorithmic ideas in such codes, since change is expensive and sometimes resisted. ISIC groups have a chance, due to the interdependence built into the SciDAC program structure, to simultaneously influence many of these codes, by delivering software incorporating optimal algorithms that may be reused across many applications. Improvements driven by one application will be available to all. While they are building community codes, this is our chance to build a CODE COMMUNITY!
SciDAC themes • Chance to do community codes “right” • Meant to set “new paradigm” for other DOE programs • new 2003 nano science modeling initiative • possible new 2004 fusion simulation initiative • Cultural barriers to interdisciplinary research acknowledged up front • Accountabilities constructed in order to force the mixing of “scientific cultures” (physicists/biologists/chemists/engineers with mathematicians/computer scientists)
Opportunity: nanoscience modeling • Jul 2002 report to DOE • Proposes $5M/year theory and modeling initiative to accompany the existing $50M/year experimental initiative in nano science • Report lays out research in numerical algorithms and optimization methods on the critical path to progress in nanotechnology
Opportunity: integrated fusion modeling • Dec 2002 report to DOE • Currently DOE supports 52 codes in Fusion Energy Sciences • US contribution to ITER will “major” in simulation • Initiative proposes to use advanced computer science techniques and numerical algorithms to improve the US code base in magnetic fusion energy and allow codes to interoperate
What’s new in SciDAC library software? • Philosophy of library usage • large codes interacting as peer applications, with complex calling patterns (e.g., physics code calls implicit solver code calls subroutine automatically generated from original physics code to supply Jacobian of physics code residual) • extensibility • polyalgorithmic adaptivity • Resources for development, long-term maintenance, and support • not just for “dissertation scope” ideas • Experience on terascale computers
Introducing “Terascale Optimal PDE Simulations” (TOPS) ISIC Nine institutions, $17M, five years, 24 co-PIs
adaptive gridding, discretization solvers systems software, component architecture, performance engineering, data management 34 apps groups (BER, BES,FES, HENP) 7 ISIC groups (4 CS, 3 Math) software integration 10 grid, data collaboratory groups performance optimization
Who we are… … the PETSc and TAO people … the Hypre and Sundials people … the SuperLU and PARPACK people … as well as the builders of other widely used packages …
Plus some university collaborators Demmel et al. Manteuffel et al. Dongarra et al. Widlund et al. Ghattas et al. Keyes et al. Our DOE lab collaborations predate SciDAC by many years.
You may know the on-line “Templates” guides … www.netlib.org/templates www.netlib.org/etemplates 124 pp. 410 pp. … these are good starts, but not adequate for SciDAC scales!
… SciDAC puts some of the authors (and many others) “on-line” for physics groups You may know the on-line “Templates” guides … www.netlib.org/templates www.netlib.org/etemplates 124 pp. 410 pp.
Optimizer Sens. Analyzer Time integrator Nonlinear solver Eigensolver Linear solver Indicates dependence Scope for TOPS • Design and implementation of “solvers” • Time integrators • Nonlinear solvers • Optimizers • Linear solvers • Eigensolvers • Software integration • Performance optimization (w/ sens. anal.) (w/ sens. anal.)
64 64 2u=f 64 * *On a 16 Mflop/s machine, six-months is reduced to 1 s The power of optimal algorithms • Advances in algorithmic efficiency rival advances in hardware architecture • Consider Poisson’s equation on a cube of size N=n3 • If n=64, this implies an overall reduction in flops of ~16 million
relative speedup year Algorithms and Moore’s Law • This advance took place over a span of about 36 years, or 24 doubling times for Moore’s Law • 22416 million the same as the factor from algorithms alone!
AMG Framework error damped by pointwise relaxation Choose coarse grids, transfer operators, etc. to eliminate, based on numerical weights, heuristics The power of optimal algorithms • Since O(N) is already optimal, there is nowhere further “upward” to go in efficiency, but one must extend optimality “outward”, to more general problems • Hence, for instance, algebraic multigrid (AMG), obtaining O(N) in indefinite,anisotropic, inhomogeneous problems algebraically smooth error
Gordon Moore <<Demi Moore>> Four orders of magnitude in 13 years Gordon Bell Prize outpaces Moore’s Law Gordon Bell CONCUR-RENCY!!!
SciDAC application: Center for Extended Magnetohydrodynamic Modeling Simulate plasmas in tokomaks, leading to understanding of plasma instability and (ultimately) new energy sources Joint work between ODU, Argonne, LLNL, and PPPL
iters 200 unscalable 150 Time to Solution 100 50 procs scalable 0 time 1000 1 10 100 Problem Size (increasing with number of processors) Optimal solvers • Convergence rate nearly independent of discretization parameters • Multilevel schemes for linear and nonlinear problems • Newton-like schemes for quadratic convergence of nonlinear problems AMG shows perfect iteration scaling, above, in contrast to ASM, but still needs performance work to achieve temporal scaling, below, on CEMM fusion code, M3D, though time is halved (or better) for large runs (all runs: 4K dofs per processor)