230 likes | 333 Views
M ultiresolution Ad aptive N um e rical S cientific S imulation. Ariana Beste 1 , George I. Fann 1 , Robert J. Harrison 1,2 , Rebecca Hartman-Baker 1 , Shinichiro Sugiki 1 1 Oak Ridge National Laboratory 2 University of Tennessee, Knoxville In collaboration with
E N D
Multiresolution Adaptive Numerical Scientific Simulation Ariana Beste1,George I. Fann1,Robert J. Harrison1,2, Rebecca Hartman-Baker1, Shinichiro Sugiki11Oak Ridge National Laboratory 2University of Tennessee, Knoxville In collaboration with Gregory Beylkin4, Fernando Perez4, Lucas Monzon4, Martin Mohlenkamp5 and others 4University of Colorado5Ohio University harrisonrj@ornl.gov
The DOE funding • This work is funded by the U.S. Department of Energy, the division of Basic Energy Science, Office of Science, under contract DE-AC05-00OR22725 with Oak Ridge National Laboratory. This research was performed in part using • resources of the National Energy Scientific Computing Center which is supported by the Office of Energy Research of the U.S. Department of Energy under contract DE-AC03-76SF0098, • and the Center for Computational Sciences at Oak Ridge National Laboratory under contract DE-AC05-00OR22725 .
Outline • Multiresolution basics • Parallel decomposition and tools • Underlying representation • Application characteristics • Current storage strategy
Molecular Science Software Project EMSL / PNNL PNNL Yuri Alexeev, Eric Bylaska, Bert deJong, Mahin Hackler, Karol Kowalski, Lisa Pollack, Tjerk Straatsma, Marat Valiev, ORNL Edo Apra, Robert Harrison Vincent Meunier Ames Ricky Kendall TL Windus Gary Black, Brett Didier, Todd Elsenthagen, Sue Havre, Carina Lansing, Bruce Palmer, Karen Schuchardt, Lisong Sun Erich Vorpagel Manoj Krishnan, Jarek Nieplocha, Bruce Palmer, Vinod Tipparaju http://www.emsl.pnl.gov/docs/nwchem/nwchem.html
Computational Chemistry EndstationInternational collaboration spanning 8 universities and 5 national labs • Led out of UT/ORNL • Focus • Actinides, Aerosols, Catalysis • ORNL Cray XT3, ANL BG/L • Capabilties: • Chemically accurate thermochemistry • Many-body methods required • Mixed QM/QM/MM dynamics • Accurate free-energy integration • Simulation of extended interfaces • Families of relativistic methods • NWChem: Largest CCSD(T) calculation • - Pollack, EMSL, 2005. • - 1960 processor Itanium2 cluster • 1468 basis functions (aug-cc-pVQZ) • Perturbative triples (T) • 23 hours on 1400 processors • 75% of peak = 6.3 TFlops. Scaling of MADNESS 64-4096 cpu on XT3
Multiresolution chemistry objectives • Complete elimination of the basis error • One-electron models (e.g., HF, DFT) • Pair models (e.g., MP2, CCSD, …) • Correct scaling of cost with system size • General approach • Readily accessible by students and researchers • Higher level of composition • Direct computation of chemical energy differences • New computational approaches • Fast algorithms with guaranteed precision
How to “think” multiresolution • Consider a ladder of function spaces • E.g., increasing quality atomic basis sets, or finer resolution grids, … • Telescoping series • Instead of using the most accurate representation, use the difference between successive approximations • Representation on V0 small/dense; differences sparse • Computationally efficient; many possible insights
High-level composition using functions and operators • Conventional quant. chem. uses explicitly indexed sparse arrays of matrix elements • Complex, tedious and error prone • Python classes for Function and Operator • in 1,2,3,6 and general dimensions • wide range of operations Hpsi = -0.5*Delsq*psi+ V*psi J = Coulomb.apply(rho) • All with guaranteed speed and precision
New MADNESS solver • Total rewrite in C++ • Three levels of parallelism targeting massively parallel computer using multi-processor nodes • In anticipation of highly-threaded processors • Ideally targets low latency AM+MPI+threads • Portable implementation polling+MPI+threads • Core math functionality is now running • 3D functions, real and complex (1-6D functions will be added this FYI) • Scaling demonstrated up to 4096 processors – designed for 100+K.
1-D Example Sub-Tree Parallelism 0 1 2 3 4 5 6 Both sub-trees can be done in parallel. In 3-D nodes split into 8 children … in 6-D there are 64 children
Distributed-memory Cilk-like model Parameter: MPI rank probe() set() get() Task: Input parameters Output parameters probe() run() Compress(tree,result): Parameter left, right if (tree.left) Compress(tree.left, left) if (tree.right) Compress(tree.right, right) AddTask(Op, left, right, result) WaitTasks() Benefits: Most receives pre-posted greatly increasing scalability Communication latency & transfer time largely hidden Much simpler composition than explicit message passing Positions code to use “intelligent” runtimes with work stealing Positions code for efficient use of multi-core chips
Essential techniques for fast computation • Multiresolution • Low-separation rank • Low-operator rank
Separated representations • Key to computing in higher dimensions • Analogs of SVD exploit low operator rank • Generalized form exploits other operator properties • E.g., these all have full operator rank but low-separation rank constructions exist • Identity operator • Green’s functions of many PDEs (Poisson, Helmholtz) • All-electron Schrödinger Hamiltonian
x x-y |x-y| r = separation rank |x-y| x-y In 3D, ideally mustbe one box removedfrom the diagonalDiagonal box hasfull rank Boxes touching diagonal (face, edge,or corner) have increasingly low rank Away from diagonalr = O(-log e) y-x |x-y| y |x-y| x-y |x-y| y-x |x-y| y-x |x-y|
Molecular electronic Schrödinger equation • A 3-N dimensional, non-separable, second-order differential equation
Dynamics of fundamental few electron systems (Krstic and Harrison) • Electron+atom/molecule scatteringMolecules in intense radiation field • Challenges • Scattering – highly oscillatory states • Dissociation – continuum states • Quantum treatment of light nuclei • Rydberg states – very large volumes • In principle, adaptive multiresolution techniques are ideal • Single basis treats bound and continuum states on equal footing • Long time steps possible via integral operator for time evolution • Separated representations provide path to higher dimensions • Waiting for new production code before can apply free-particle propagator efficiently for implicit scheme (integral kernel is exp(-ix2/2t) ) • Need a more strongly band limited basis? • Want to do this in at least 5-9D, 12D being considered
-0.53 -1.31 -0.67 -20.44 -0.48 “Independent” particle models • Atomic and molecular orbitals • Each electron feels the mean field of all other electrons (self-consistent field, Hartree-Fock) • Replaces linear 3N-D Schrödinger w. non-linear 3-D eigen-problem • Provides the structure of the periodic table and the chemical bond • Linear combination of atomic orbitals - LCAO • E.g., molecular orbitals for water, H2O
Density functional theory (DFT) • Hohenberg-Kohn theorem • The energy is a functional of the density (3D) • Kohn-Sham • Practical approach to DFT, parameterizing the density with orbitals (easier treatment of kinetic energy) • Very similar computationally to Hartree-Fock, but potentially exact
Reduced scaling method • Eigen-functions (canonical orbitals) can be delocalized • Limits to O(VN) data and O(VN2) compute • Solve instead for localized orbitals that span the same space • Limits to O(NlnV) data and compute • Multiresolution representation makes this easy • Remaining linear algebra has small pre-factor and is sparse
Current I/O Strategy • Looked seriously at HDF and Phil’s API • Substantial effort for adoption; HDF perf. questions • Substantial benefits from interoperability • Short-term driver is check point restart • Tunable subset of nodes doing I/O • Currently nodes at a level in tree (in 3D 1, 8, 64, …) • Collect data from other nodes • Serialize to disk in either binary or text (XML) • Already want interfaces to viz. tools • Starting to consider interface to external solvers • Sundance, PetSc, …
Summary of MADNESS data • Discontinuous spectral element • Legendre polynomials, or • Approximate prolate spheroidal functions • Structured, deeply-refined, adaptive mesh • In higher-dimensions • Separated representations in most elements • Mix of data types • Float, double, float-complex, double-complex • 100s to 10Ks of distinct functions in 3D • 10s of Gb to 10s of Tb of data • Few functions in 6+D • 100s of Gb to 10s of Tb