190 likes | 406 Views
Introduction to the very basic computational aspects of the modern Quantum Chemistry for Software Engineers. Alexander A. Granovsky The PC GAMESS/Firefly Project. July 23, 2009 MSU, Moscow, Russia. Outline. Quantum Chemistry: purpose and methods
E N D
Introduction to the very basic computational aspects of the modern Quantum Chemistry for Software Engineers Alexander A. Granovsky The PC GAMESS/Firefly Project July 23, 2009 MSU, Moscow, Russia
Outline • Quantum Chemistry: purpose and methods • Typical tasks, their parameters and computational complexity • Conventional, direct, and semi-direct methods • Standard and “fast” methods • Typical parallel algorithms: key features and open problems • Canonical example – four index integral transformation step
Quantum Chemistry: purpose and methods • Quantum Chemistry (QC) is the science based on the applications of the “first principles” of Quantum Mechanics to the modeling of chemical systems and processes. • All chemical systems are treated as the sets of electrons and nuclei described by the molecular Hamiltonian operator. Solutions of the molecular Schrödinger Equation contain information on all the molecular properties. • The molecular Schrödinger Equation has to be solved approximately to obtain information on the properties of the molecular system of interest.
Quantum Chemistry – standard model • Non-relativistic or “weakly relativistic” theory mainly based on the standard Quantum Mechanics • Most widely used approach • Note, spins of electrons are still very important variables! • More or less quasi-relativistic and purely relativistic approaches are primary used to describe systems with heavy nuclei • Adiabatic or Born-Oppenheimer approximations • Nuclei are “fixed” or moving slowly. • Molecular Hamiltonian now acts on electronic variables and depends parametrically on nuclear variables • Algebraic approach • Use of finite basis sets to solve eigenvalue/eigenvector problem • Modern QC is the highly algebraic science!
Quantum Chemistry – algebraic approach • Hamiltonian is a two-particle operator acting on the functions of 3*n variables (electronic degrees of freedom) • One needs a suitable basis to deal with • Electrons are fermions • Basis functions are thus the antisymmetrized direct products (Slater determinants) of the (orthogonal) single-electron basis functions (Molecular Orbitals or MOs) • The set of single-electron basis functions can be obtained e.g. from the mean-field SCF calculations • Finally, single-electron basis functions are expressed as the linear combinations (MO LCAO) of the nuclei-centered properly chosen (non-orthogonal) atomic basis set functions (Atomic Orbitals or AOs).
Some important facts • One needs the rules to compute matrix elements of Hamiltonian and other operators • These are so-called Slater rules • Most important consequences of the two-body nature of electronic Hamiltonian • Matrix elements can be expressed as the combinations of four-index quantities (ij|kl) - so called “two-electron integrals” • Called “atomic integrals” in the original AO basis set • (|) • Called “molecular integrals” being transformed to the MO basis • (ij|kl) • Simple consequence: use of four-index quantities (tensors) are more or less unavoidable in QC!
Some important collisions • Let N be the number of atomic basis functions (AOs) – the main parameter controlling complexity • The native size of dense matrices typical to QC methods is about of N by N, e.g. 1000x1000 • Relatively small matrices • Has nothing common with HPL • The native size of sparse matrices typical to QC methods varies but is usually very large (e.g. up to ca. N!) • No any regular structure usually… • The native size of intermediate quantities to be computed and reused can be up to N4 (two-electron integrals in MO basis) and more. • 10004 double precision numbers would require 8 TBytes of RAM or storage
Typical tasks, their parameters and computational complexity • QC – myriads of theoretical approximations • To name just a few • Hartree-Fock (Self-Consistent Field) and Density Functional Theory • Simplest Mean Field Theories • Perturbative approaches • Single-reference RS-type perturbation theories • MP2, MP3, MP4 etc… • Various Multi-Reference and/or Quasi-Degenerate perturbation theories • Configuration Interaction (CI) • Linear variational principle • Lots of different types of CI • Coupled Clusters • Truncated exponential Ansatz • Lots of different approximations/variants • Lots of multi-reference methods… • Green functions, propagators and similar approaches… • Time-dependent approaches…
Quantum Chemistry – computation complexity • Hartree-Fock (Self-Consistent Field) and Density Functional Theory • From N2 to N4 • Perturbative approaches • N5 at the second order, N6 at the third, N7 at the fourth order of PT… • Configuration Interaction • Lots of different CI types • E.g., N6 for CISD • Up to N! for Full CI • Coupled Clusters • Lots of different approximations/variants • Most widely used approaches - N6 and worse
Conventional, direct and semidirect methods • Basically, the question is whether to store intermediates on disk or recompute them as needed • Conventional • store almost all, never recompute • More advanced variants use real-time data compression and may store some metadata instead of raw intermediates • Direct • recompute as much as computationally feasible, store minimal amount of data • Semidirect • Reasonable compromise between fully Conventional and fully Direct limits
Standard (canonical) and “fast” methods • “Fast” methods • An attempt to improve algorithmic complexity for large problems • Some examples: • Use of Quantum Fast Multipole Method (QFMM) • Based on FMM ideas but much more involved • Use of Laplace transform or other tricks to avoid so-called energy denominators (e.g. Laplace transform MP2) • Use of spatially-localized intermediate basis functions • (Density) fitting and related approximations • Two classes of methods • Allowing to get exact answer within given theoretical model • Resulting only in approximate answers
Typical large-scale QC calculation requirements • Petaflops of operations • Terabytes of data • Gigabytes of memory Efficient highly-scalable parallel algorithms are mandatory
Typical parallel algorithms: key features and open problems • Key features and open problems • Efficient I/O is very important • Use of advanced I/O features of OS directly • “On the fly” data compression/decompression • Efficient memory management is very important • Efficient multithreading is very important • Typically, OpenMP is just not enough flexible to be used. • Direct use of OS-level API • Efficient communications are very important • In particular, MPI-1 and MPI-2 are just not enough flexible to use in all situations. • Use of proprietary communication interfaces. • Main problem – myriads of very different theoretical and hence computational methods • each has a set of different combinations of controlling parameters with their own optimal computational strategy • For optimal efficiency, each theoretical model has to be coded multiple times as a set of several separate, very complex algorithms. • The degree of code reuse is not too high unfortunately
Canonical problem: Integral transformation step • (pq|rs) = CpCqCrCs (|) • Formally N8 step • Usually considered as a sequence of four sequential quarter-transformations: • (p|) = Cp(|) • (pq|) = Cq(p|), • etc… • Computation complexity: N5 or below! • Lots of different strategies • complete integral transformation vs. partial transformation specific to particular approximation • different requirements to the size of RAM and intermediate files to be used • different parallelization strategies • different requirements to the way of distribution of computed quantities across nodes • Etc… • Hundreds of publications so far…
MP2 calculation (PC GAMESS, Spring 2004) for Fullerene dimer Pentium 4C 2.4 GHz / 1024MB / 120GB / Gigabit Ethernet