330 likes | 520 Views
SciDAC Progress Report: Algorithms and Parallel Methods for Reactive Atomistic Simulations. 05.07.2009. Project Accomplishments. Novel algorithms (solvers, data structures) for reactive simulations Comprehensive validation
E N D
SciDAC Progress Report: Algorithms and Parallel Methods for Reactive Atomistic Simulations 05.07.2009
Project Accomplishments • Novel algorithms (solvers, data structures) for reactive simulations • Comprehensive validation • Parallel formulation, implementation, performance characterization and optimization • Software release over public domain • Incorporation of solvers into LAMMPS
Project Accomplishments: Algorithms and Data Structures • Optimal dynamic data structures for 2-, 3-, and 4-body interactions • Krylov subspace solvers for Charge Equilibriation • Effective preconditioners (Block Jacobi) • Reusing subspaces, selective orthogonalization • Effective initialization strategies
Project Accomplishments: Comprehensive Validation • In-house validation on • Bulk water • Silica • Other hydrocarbons (hexane, cyclohexane) • Collaborative validation on a number of other systems (please see software release)
Project Accomplishments: Parallel Implementation • Highly optimized parallel formulation validated on bgl (BG/L), Jaguar (XT4), and Ranger (Sun), among others. • Optimizations to other platforms under way. • Parallel code in limited release (to Purdue, MIT, and NIST).
Project Accomplishments: Software Release • Code Release (limited public release) • Purdue (Strachan et al., Si/Ge/Si Nanorods) • CalTech (Goddard et al., Force field development) • MIT (Buehler et al., Silica/water) • PSU (van Duin et al., Force field development) • USF (Pandit et al., Silica/water interface) • UIUC (Aluru et al.) • Sandia (Thompson, LAMMPS development) • Norwegian Institute for Science and Technology (IBM/AIX optimization)
Project Accomplishments: LAMMPS Development • Charge equilibriation implemented as Fix to LAMMPS • Fully validated for accuracy and performance • Preliminary implementation of ReaxFF into LAMMPS • Student at Sandia to complete implementation over summer
Project Accomplishments: Details • The dominant computational cost is associated with the following force field computations • Bonded potential • Non-bonded potential • Neighbor potential • Charge equilibriation (qEq)
Project Accomplishments: Details • Bonded, non-bonded, and neighbor potentials require efficient (dynamic) data structures. Their computation is also typically highly optimized through lookups. • Charge equilibriation minimizes electrostatic energy to compute partial charges on atoms. This can be linearized and solved at each timestep using iterative solvers such as CG and GMRES.
Accurate Charge Equilibriation is Essential to Modeling Fidelity
Computational Cost of Charge Equilibriation • Charge equilibriation dominates overall cost for required (low error tolerance) and for larger systems. • Efficient solvers for charge equilibriation are critical.
Algorithms for Charge Equilibriation • At required tolerances and for larger systems (106 atoms and beyond), charge equilibriation can take over 75% of total simulation time. • Efficient algorithms for solving the linear system are essential. • We implement a number of techniques to accelerate the solve: • Effective preconditioners (nested, Block Jacobi) • Reuse of Krylov subspaces (solution spaces are not likely to change significantly across timesteps) • Selective reorthogonalization (orthogonalization is the major bottleneck for scalability) • Initial estimates through higher order extrapolation.
Algorithms for Charge Equilibriation • Accelerating GMRES/CG for charge equilibriation • The kernel for the matrix is shielded electrostatics • The electrostatics is cutoff, typically at 7 – 10 Ao • An implicit Block Jacobi accelerator can be constructed from a near-block (say 4 Ao neighborhood) • The inverse block can be explicitly computed and reused • Alternately, an inner-outer scheme successively increases cutoff and uses the shorter cutoff to precondition the outer, longer cutoff • Both schemes implemented in parallel and show excellent scaling. Relative performance is system dependent.
Single Processor Performance Profiling • Memory usage and runtimes (NVE water, 648, 6540, 13080, 26160 atoms). • Relative cost of various phases
Single Processor Performance Profiling • Our code is extremely efficient/optimized. In comparison to traditional (non-reactive) MD simulations (Gromacs), our code was only 3x slower (tested on water and hexane) • Our code has a very low memory footprint. This is essential since it allows us to scale problems to larger instances, facilitating scalability to large machine configurations
Parallel Performance • A number of optimizations have been implemented • Trading off redundant computations for messages • Efficient use of shadow domains and the midpoint method for minimizing redundant computations • Reducing number of orthogonalizations in charge equilibriation • Platform-specific optimizations
Parallel Performance • Performance characterized primarily on two platforms • Code achieved 81% efficiency on 1024 cores of ranger at approximately 6100 atoms/core (1.9s/timestep for a 6.2M atom system) • Code achieved 77% efficiency on 8K cores of a BG/L at approximately 600 atoms/core (1.1s/timestep for a 4.8M atom system)
Ongoing Work • Near Term (12 months) • Integrating out reactive atomistic framework into LAMMPS (Graduate student Metin Aktulga spending summer with Aidan Thompson and Steve Plimpton at Sandia) • Parallelizing the GMRES qEq fix to LAMMPS • Sampling techniques for force-field optimization
Ongoing Work • Medium to Long Term (24-36 months) • Advanced accelerators for qEq (multipole-type hierarchical preconditioners) • Platform-specific optimizations (Tesla/GPU, RoadRunner) • Supporting hybrid force-fields (reactive and non-reactive force fields) • Novel solvers, in particular, SPIKE-based techniques
Charge Equilibration (QEq) Method • Expand electrostatic energy as a Taylor series in charge around neutral charge. • Identify the term linear in charge as electronegativity of the atom and the quadratic term as electrostatic potential and self energy. • Using these, solve for self-term of partial derivative of electrostatic energy.
Qeq Method We need to minimize: subject to: where
Qeq Method From charge neutrality, we get:
Qeq Method Let where or
Qeq Method Substituting back, we get: We need to solve 2n equations with kernel H for si and ti.
Qeq Method • Observations: • H is dense. • The diagonal term is Ji • The shielding term is short-range • Long range behavior of the kernel is 1/r
Hexane (@200K) and cyclohexane (@300K) - liquid phase • ~10000 atoms randomly placed around lattice points in a cube • NVT (@200K for hexane, @300K for cyclohexane), cube is shrunk by 1A on each side after every 7500 steps another way to measure density.