870 likes | 975 Views
Overview of the U. S. DOE Scientific Discovery through Advanced Computing (SciDAC) Project. David E. Keyes Center for Computational Science Old Dominion University Institute for Computer Applications in Science & Engineering NASA Langley Research Center
E N D
Overview of the U. S. DOE Scientific Discovery through Advanced Computing (SciDAC) Project David E. Keyes Center for Computational Science Old Dominion University Institute for Computer Applications in Science & Engineering NASA Langley Research Center Institute for Scientific Computing Research Lawrence Livermore National Laboratory
Announcement • University Relations Program Annual Summer Picnic • Wednesday, 26 June 2002 • Noon to 1pm • LLESA Picnic Area • $5 per person • Advance sign-up required • RSVP: Joanna Allen, 2-0620 • No parking available
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Environment global climate contaminant transport Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Applied Physics radiation transport supernovae Experiments dangerous Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineering crash testing aerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Experiments expensive Scientific Simulation In these, and many other areas, simulation is an important complement to experiment.
Engineeringcrash testingaerodynamics Lasers & Energycombustion ICF Biology drug design genomics Terascale simulation has been “sold” Experiments prohibited or impossible Applied Physics radiation transport supernovae Experiments dangerous Experiments difficult to instrument Environment global climate contaminant transport Experiments controversial Experiments expensive Scientific Simulation However, simulation is far from proven! To meet expectations, we need to handle problems of multiple physical scales.
Enabling technologies groups to develop reusable software and partner with application groups • For 2001 start-up, 51 projects share $57M/year • Approximately one-third for applications • A third for “integrated software infrastructure centers” • A third for grid infrastructure and collaboratories • Plus, two new 5 Tflop/s IBM SP machines available for SciDAC researchers
SciDAC project characteristics • Affirmation of importance of simulation • for new scientific discovery, not just for “fitting” experiments • Recognition that leading-edge simulation is interdisciplinary • no support for physicists and chemists to write their own software infrastructure; must collaborate with math & CS experts • Commitment to distributed hierarchical memory computers • new code must target this architecture type • Requirement of lab-university collaborations • complementary strengths in simulation • 13 laboratories and 50 universities in first round of projects
Pacific Northwest Brookhaven Argonne Lawrence Berkeley Lawrence Livermore Sandia Livermore Los Alamos Oak Ridge Sandia DOE Science Lab DOE Defense Lab Major DOE labs Old Dominion University
Plan Develop Use Large platforms provided for ASCI … 100+ Tflop / 30 TB Livermore 50+ Tflop / 25 TB 7.2 Tflop/s LINPACK 30+ Tflop / 10 TB Capability 10+ Tflop / 4 TB White 3+ Tflop / 1.5 TB Blue Livermore Red 1+ Tflop / 0.5 TB ‘97 ‘98 ‘99 ‘00 ‘01 ‘02 ‘03 ‘04 ‘05 ‘06 Time (CY) Sandia Los Alamos ASCI program of the U.S. DOE has roadmap to go to 100 Tflop/s by 2006 www.llnl.gov/asci/platforms
…and now for SciDAC • IBM Power3+ SMP • 16 procs per node • 208 nodes • 24 Gflop/s per node • 5 Tflop/s Berkeley • IBM Power4 Regatta • 32 procs per node • 24 nodes • 166 Gflop/s per node • 4Tflop/s (10 in 2003) Oak Ridge
Caltech Argonne 40Gb/s NCSA/PACI 8 TF 240 TB (OC-192) SDSC 4.1 TF 225 TB NSF’s 13.6 Tflop/s TeraGrid coming on line TeraGrid: NCSA, SDSC, Caltech, Argonne www.teragrid.org Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 External Networks External Networks Site Resources Site Resources HPSS UniTree c/o I. Foster
14 10 13 10 12 10 11 10 3 10 2 10 10 Idealized Kiviat diagram for architecture Computing Speed FLOPS Year Memory 2003 50 Terabytes 5 ‘00 0.5 Application Performance ‘97 0.05 ‘96 Programs Platforms 0.13 Archival Storage 1 1.3 13 130 Petabytes 5 0.13 50 1.3 500 13 5000 Network Speed Parallel I/O 130 Gigabits/sec Gigabytes/sec
Japan’s Earth Simulator Bird’s-eye View of the Earth Simulator System Disks Cartridge Tape Library System Processor Node (PN) Cabinets 35.6 Tflop/s LINPACK Interconnection Network (IN) Cabinets Air Conditioning System 65m Power Supply System 50m Double Floor for IN Cables
Cross-section of Earth Simulator Building Lightning protection system Air-conditioning return duct Double floor for IN cables and air-conditioning Power supply system Air-conditioning system Seismic isolation system
Earth Simulator Bird’s Eye Power plant Computer system Operations and research
“horizontal” aspects “vertical” aspects network latency, BW, diameter memory latency, BW; L/S (cache/reg) BW Boundary conditions from architecture Algorithms must run on physically distributed memory units connected by message-passing network, each serving one or more processors with multiple levels of cache
Following the platforms … • … Algorithms must be • highly concurrent and straightforward to load balance • not communication bound • cache friendly (temporal and spatial locality of reference) • highly scalable (in the sense of convergence) • Goal for algorithmic scalability: fill up memory of arbitrarily large machines while preserving nearly constant* running times with respect to proportionally smaller problem on one processor *logarithmically growing
Gordon Bell Prize performance ????????????????????????????
Gordon Moore <<Demi Moore>> Four orders of magnitude in 13 years Gordon Bell Prize outpaces Moore’s Law Gordon Bell CONCUR-RENCY!!!
Official SciDAC Goals • “Create a new generation of scientific simulation codes that take full advantage of the extraordinary computing capabilities of terascale computers.” • “Create the mathematical and systems software to enable the scientific simulation codes to effectively and efficiently use terascale computers.” • “Create a collaboratory software environment to enable geographically separated scientists to effectively work together as a team and to facilitate remote access to both facilities and data.”
Four science programs involved … “14 projects will advance the science of climate simulation and prediction. These projects involve novel methods and computationally efficient approaches for simulating components of theclimate system and work on an integrated climate model.” “10 projects will address quantum chemistry and fluid dynamics, for modeling energy-related chemical transformations such as combustion,catalysis, and photochemical energy conversion. The goal of these projectsis efficient computational algorithms to predict complex molecularstructures and reaction rates with unprecedented accuracy.”
Four science programs involved … “4 projects in high energy and nuclearphysics will explore the fundamental processes of nature. The projectsinclude the search for the explosion mechanism of core-collapsesupernovae, development of a new generation of accelerator simulationcodes, and simulations of quantum chromodynamics.” “5 projects are focused on developing and improving the physics models needed forintegrated simulations of plasma systems to advance fusion energy science.These projects will focus on such fundamental phenomena as electromagneticwave-plasma interactions, plasma turbulence, and macroscopic stability ofmagnetically confined plasmas.”
SciDAC 1st Year Portfolio: $57M for Math, Information and Computer Sciences
Data Grids and Collaboratories • National data grids • Particle physics grid • Earth system grid • Plasma physics for magnetic fusion • DOE Science Grid • Middleware • Security and policy for group collaboration • Middleware technology for science portals • Network research • Bandwidth estimation, measurement methodologies and application • Optimizing performance of distributed applications • Edge-based traffic processing • Enabling technology for wide-area data intensive applications
What is the/a “Grid”? • The Grid refers to an infrastructure that enables the integrated, collaborative use of high-end computers, networks, databases, and scientific instruments owned and managed by multiple organizations. • Grid applications often involve large amounts of data and/or computing and often require secure resource sharing across organizational boundaries, and are thus not easily handled by today’s Internet and Web infrastructures.
Grid example: smart instruments DOE X-ray grand challenge Advanced Photon Source wide-area dissemination desktop & VR clients with shared controls real-time collection archival storage tomographic reconstruction
First Grid textbook • “The Grid: Blueprint for a New Computing Infrastructure” • Edited by Ian Foster & Carl Kesselman • July 1998, 701 pages • “This is a source book for the history of the future.”Vint Cerf, Senior Vice President, Internet Architecture and Engineering, MCI
Computer Science ISICs • Scalable Systems Software Provide software tools for management and utilization of terascale resources. • High-end Computer System Performance: Science and Engineering Develop a science of performance prediction based on concepts of program signatures, machine signatures, detailed profiling, and performance simulation and apply to complex DOE applications. Develop tools that assist users to engineer better performance. • Scientific Data Management Provide a framework for efficient management and data mining of large, heterogeneous, distributed data sets. • Component Technology for Terascale Software Develop software component technology for high-performance parallel scientific codes, promoting reuse and interoperability of complex software, and assist application groups to incorporate component technology into their high-value codes.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Applied Math ISICs • Terascale Simulation Tools and Technologies Develop framework for use of multiple mesh and discretization strategies within a single PDE simulation. Focus on high-quality hybrid mesh generation for representing complex and evolving domains, high-order discretization techniques, and adaptive strategies for automatically optimizing a mesh to follow moving fronts or to capture important solution features. • Algorithmic and Software Framework for Partial Differential Equations Develop framework for PDE simulation based on locally structured grid methods, including adaptive meshes for problems with multiple length scales; embedded boundary and overset grid methods for complex geometries; efficient and accurate methods for particle and hybrid particle/mesh simulations. • Terascale Optimal PDE Simulations Develop an integrated toolkit of near optimal complexity solvers for nonlinear PDE simulations. Focus on multilevel methods for nonlinear PDEs, PDE-based eigenanalysis, and optimization of PDE-constrained systems. Packages sharing same distributed data structures include: adaptive time integrators for stiff systems, nonlinear implicit solvers, optimization, linear solvers, and eigenanalysis.
Exciting time for enabling technologies SciDAC application groups have been chartered to build new and improved COMMUNITY CODES. Such codes, such as NWCHEM, consume hundreds of person-years of development, run at hundreds of installations, are given large fractions of community compute resources for decades, and acquire an “authority” that can enable or limit what is done and accepted as science in their respective communities. Except at the beginning, it is difficult to promote major algorithmic ideas in such codes, since change is expensive and sometimes resisted. ISIC groups have a chance, due to the interdependence built into the SciDAC program structure, to simultaneously influence many of these codes, by delivering software incorporating optimal algorithms that may be reused across many applications. Improvements driven by one application will be available to all. While they are building community codes, this is our chance to build a CODE COMMUNITY!
SciDAC themes • Chance to do community codes “right” • Meant to set “new paradigm” for other DOE programs • Cultural barriers to interdisciplinary research acknowledged up front • Accountabilities constructed in order to force the “scientific culture” issue
Sample Application/ISIC Interactions Slide c/o C. Romine, DOE HQ
What’s new in SciDAC library software? • Philosophy of library usage • complex algorithms with lots of callbacks to user code (e.g., to physics routines by implicit solvers) • extensibility • polyalgorithmic adaptivity • Resources for development, maintenance, and support • not just for “dissertation scope” ideas • Experience on terascale scale computers
Traditional approach to software interoperability • Direct interfacing between different packages/libraries/apps • Public interfaces are unique • Many-to-Many couplings require Many 2 interfaces • Often a heroic effort to understand the details of both codes • Not a scalable solution Overture Hypre GRACE Data / mesh software Trilinos Linear solvers SUMAA3d ISIS++ DAs PETSc Slide c/o L. McInnes, ANL
Overture D a t a Hypre E S I GRACE Trilinos SUMAA3d ISIS++ DAs PETSc CCA approach:common interface specification • Reduces the Many-to-Many problem to a Many-to-One problem • Allows interchangeability and experimentation • Difficulties • Interface agreement • Functionality limitations • Maintaining performance Slide c/o L. McInnes, ANL
CCA concept:SCMD (SPMD) components MPI application using CCA for interaction between components A and B within the same address space Adaptive mesh component written by user1 Proc1 Proc2 Proc3 etc... MPI A A A A Direct Connection supplied by framework at compile/runtime MPI B B B B Solver component written by user2 Process Slide c/o L. McInnes, ANL
Applications APDEC TSTT TOPS SDM PERC CCA SS Interacting with ISICs Indicates “dependence on”
Applications APDEC TSTT SDM TOPS PERC, CCA SS Interacting with ISICs Indicates “dependence on”
Introducing “Terascale Optimal PDE Simulations” (TOPS) ISIC Nine institutions, $17M, five years, 24 co-PIs
adaptive gridding, discretization solvers systems software, component architecture, performance engineering, data management 34 apps groups (BER, BES,FES, HENP) 7 ISIC groups (4 CS, 3 Math) software integration 10 grid, data collaboratory groups performance optimization
Who we are… … the PETSc and TAO people … the hypre and PVODE people … the SuperLU and PARPACK people … as well as the builders of other widely used packages …
Plus some university collaborators Demmel et al. Manteuffel et al. Dongarra et al. Widlund et al. Ghattas et al. Keyes et al. Our DOE lab collaborations predate SciDAC by many years.