Automatic Construction of Ab Initio Potential Energy Surfaces

Automatic Construction of Ab Initio Potential Energy Surfaces Interpolative Moving Least Squares (IMLS) Fitting of Ab Initio Data for Constructing Global Potential Energy Surfaces for Spectroscopy and Dynamics Donald L. Thompson University of Missouri – Columbia Richard Dawes, Al Wagner, & Michael Minkoff IMA Workshop: Chemical Dynamics: Challenges and Approaches January 12-16, 2009 University of Minnesota

Potential Energy Surfaces  Basis for quantum and classical dynamics, spectroscopy Electronic structure calculations can provide accurate energies (even gradients and Hessians) – but at a high cost (Highly accurate energy calculations for a single geometry can take hours or days) • We want to: • Generate accurate global PESs fit to a minimum number (100’s – 1000’s) of ab initio points • Make ab initio dynamics feasible for the highest levels of quantum chemistry methods (for which gradients may not be directly available) As “blackbox” as possible

Requirements of the fitting methods: Minimize number of ab initio points Minimal human effort and cost of fitting Low-cost accurate evaluations Our approach: Interpolating Moving Least Squares (IMLS)  Much cheaper than high-level quantum chemistry  Doesn’t need gradients, but can use gradients and Hessians  Can use high-degree polynomials • How to make efficient and practical: • Optimally place minimum number of points •  Weight functions • Reuse fitting coefficients (store local expansions) • Use zeroth-order PES and fit difference  Other techniques

Least-Squares Fitting Usual applications are for data with statistical errors, but trends that follow known functional forms. • Ab initio energies do not have random errors • A PES does not have a precisely known functional form  • the energy points lie on a surface • of unknown shape • Thus, fit with a general basis set (e.g., polynomials) • Basis functions that ~ the “true” function provides a more compact representation Fitting ab initio energies

Weighted Least Squares Equations W=1 gives standard least squares We use standard routines BTW(z) B a(z) = BTW(z)V

Weighted vs. standard least squares Standard, first degree fit to the 5 points First Degree IMLS, first degree IMLS fits perfectly at each point Second Degree Standard, second degree IMLS, second degree

Optimum Point Placement  We want to do the fewest number of ab initio calculations  A non-uniform distribution of points is best  We can use the fact that IMLS fits perfectly at each point to determine where to place points for the most accurate fit using the fewest possible points • Use fits of different degree IMLS fits for automatic point placement Illustrate for 1-D Morse potential 5 “seed” points

Automatic Point Placement: 1-D Illustration Consider starting with 5 uniformly placed points Fit with 2nd & 3rd degree IMLS Add new point where they differ the most Squared difference indicates where new points are needed

5 initial points 1 new point added Automatic Point Placement 2 new points added 3 new points added

Density adaptive weight function Automatic point placement will generate a nonuniform density of points. Thus, we use a flexible, density-dependent weight function

High Dimensional Model Representation(HDMR) basis set • Can represent high dimensional function through an expansion of lower order terms • Can also use full dimensional expansion but restrict the order of terms differently • Evaluation scales as NM2. HDMR greatly reduces M. • HDMR reduces the number of points required.

Accurate PESs from Low-Density Data Initial testing for 3-D: HCN-HNC We used the global PES fit to ab initio points by van Mourik et al.* as a source for (cheap) points.  Saves time obtaining points  Allows extensive error analyses We fit using (12,9,7) HDMR basis: 1-coordinate term truncated at 12th degree 2-coordinate term truncated at 9th degree 3-coordinate term truncated at 7th degree 180 basis functions * T. van Mourik, G. J. Harris, O. L. Polyansky, J. Tennyson, A. G. Császár, and P. J. Knowles, J. Chem. Phys. 115, 3706 (2001).

Error as function of automatically selected data points Seed points: Start with 4, 6, & 8 for r, R & cosθ Energy cutoff: 100 kcal/mol Data Points: van Mourik et al. PES 3-D HCN:HNC Automatic surface generation Using (12,9,7) & (11,8,6) bases Successive Order: Solid True Error: Open The difference in successive orders follows closely the true error. Thus, adding points based on difference criteria results in converged true error RMS Mean

Convergence rate dependence on basis set: HCN Accuracy follows Farwig’s* formula for power-law convergence  Linear on log-log plot with slope ~(n+1)/D, where n = degree of basis Obeys power law over 3 orders of magnitude RMS Error (kcal/mol) 8th degree & HDMR (12,9,7) both have ~ 180 functions, but HDMR converges faster Number of Points * R. Farwig, J. Comput. Appl. Math. 16, 79 (1986); Math. Comput. 46, 577 (1986).

Cutting cost: Local IMLS • Cost of evaluation scales as NM2 for standard IMLS (N=# ab initio points, M=# basis functions) • High-degree standard IMLS is too costly to use directly, thus we use local-IMLS: Local approximants (polynomials) of the potential near data points are calculated using IMLS (expensive) & the interpolated value is taken to be a weighted sum of them • In standard IMLS they are recomputed at each evaluation point (very accurate, but too costly) • The coefficients are generally slowly varying • In the L-IMLS approach coefficients are computed & stored at a relatively small number of points • Evaluations are low cost weighted interpolations between stored points

Overcoming scaling problem for automatic point selection • We get high accuracy & low cost with high-degree L-IMLS but must find optimum place to add each ab initio point • Trivial in 1-D as shown earlier • With L-IMLS the functions whose maxima we seek are continuously globally defined as are their gradients • So, define negative of the squared-difference surface • We can use efficient minimization schemes such as conjugate gradient to find local minima • Difference between successive orders of IMLS • Can also use variance of weighted contributions to interpolated value with local IMLS • Grid or random search scales very poorly with dimension

Automated PES fitting in 3-D: HCN-HNC Used 30 random starting points for minimizations Basis set not well supported Spectroscopic accuracy  To less than 1 cm-1 within 792 pts with Hessians or 1000 pts with gradients 223 828 318 For 0.1 kcal/mol ~cm-1 HDMR (12,9,7) But we can do even better Discussed below The PES is fit up to 100 kcal/mol

Dynamic Basis Procedure Avoids including points in the seed data that are not optimally located Start with very small initial grid of points & use automatic surface generation with a small basis, successively increasing the basis as points are added to support the larger basis.

Automated Dynamic Basis: 6-D (HOOH) Fit to analylic H2O2 PES* Dynamic basis Fit up to 100 kcal/mol A min. of 591 pts. would be needed if we started with the (10,7,5,4) basis. We started with 108. Convergence also much faster 164 114 754 RMS error based on randomly selected test points * B. Kuhn et al. J. Chem. Phys. 111, 2565 (1999).

Spectroscopic Accuracy: 9-D (CH4) Test Case: Schwenke & Partridge PES: a least squares fit to ~8000 CCSD(T)/cc-pVTZ ab initio data over the range 0-26,000 cm-1  We fit the range 0-20,000 cm-1 (57.2 kcal/mol). Energies & gradients only (Hessians data not cost effective as shown earlier) Bond distances Exploited permutation symmetry Dynamic basis procedure D. W. Schwenke & H. Partridge, Spectrochim Acta Part A 57, 887 (2001)

Automated PES fitting in 9-D (CH4) With 1552 pts. the E only RMS error is 0.41 kcal/mol & including gradients brings it down to 0.32 kcal/mol. The RMS error for the Schwenke-Partridge PES (based on 8000 pts) is ~0.35 kcal/mol The IMLS fitting is essentially automatic, little human effort, and no prior knowledge of the topology 9,6,4,4 9,6,4,4

CASSCFPES for the CH2Testing Fitting Accuracy Generated a spectroscopically accurate PES for CH2 for energies up to 20,000 cm-1 CASSCF/aug-cc-pVDZ calculations in valence coordinates. 216 vibrational levels were computed using a discrete variable representation (DVR) method. For a benchmark we performed a “direct” DVR calculation using ab initio calculations at all 22,400 DVR points.

CH2: fit to energies and gradients True and estimated errors are in near perfect agreement Black: estimated errors 2.0 cm-1 Red: true errors 0.5 cm-1 0.33 cm-1 CASSCF calculation in valence coordinates. Energy range of 20000 cm-1. Estimated error vs. true error (sets of 500 random ab initio calcs). True error (RMS and mean) are sub-wavenumber using 355 points.

CH2: Comparison of fits with valence and bond distance coordinates to energies and gradients Valence coordinates Bond Distances Stoppedat 500 pts. Mean error ~2.8 cm-1

Singlet Methylene Vibrational Levels: Discrete Variable Representation (DVR) Calculation  Absolute errors for 216 vibrational levels (below 20,000 cm-1).  Variational vibrational calculations were performed using DVR and a PES fitted with a mean estimated error of 2.0 cm-1  Exact levels were benchmarked by a DVR calculation using ab initio calculations at all 22,400 DVR points.  CASSCF (full valence) with aug-cc-pVDZ

Singlet Methylene Vibrational Levels: Discrete Variable Representation (DVR) Calculation Plot of absolute errors for 216 vibrational levels (below 20,000 cm-1). Variational vibrational calculations were performed using a DVR and fitted PESs with mean estimated errors of 0.5 cm-1 Exact levels were benchmarked by a DVR calculation using ab initio calculations at all 22,400 DVR points.

Singlet Methylene Vibrational Levels: Comparisons 2.0 cm-1 mean estimated error 0.5 cm-1 mean estimated error

Singlet Methylene Vibrational Levels: Discrete Variable Representation (DVR) Calculation Absolute errors for 216 vibrational levels (below 20,000 cm-1). Variational vibrational calculations were performed using a DVR and PES fitted with mean estimated errors of 0.33 cm-1 Exact levels were benchmarked by a DVR calculation using ab initio calculations at all 22,400 DVR points. Mean and maximum errors for levels computed with this PES are 0.10 and 0.41 cm-1.

Singlet Methylene Vibrational Levels: Comparisons 2.0 cm-1 mean estimated error 0.33 cm-1 mean estimated error

A New PES Highly Accurate PES for the CH2 • A series of sub-wavenumber PESs fit using automatic surface generator • Separate surfaces for • MRCI/avdz, MRCI/avtz, MRCI/avqz, MRCI+Q/avdz, MRCI+Q/avtz, • MRCI+Q/avqz,CCSD(T)(AE)/acvtz, CCSD(T)(FC)/acvtz • Used DVR to determine best PES based on comparisons with experiment. • generated a spectroscopically accurate PES for CH2 for energies up to • 20,000 cm-1 • MRCI/CBS+C-V+DW(+Q) • [Multi-reference configuration interaction (MRCI) with complete basis • extrapolation (CBS) and CCSD(T) based Core-Valence (CV) correction. • Davidson correction to MRCI added with dynamic weighting based on • small basis Full CI calculations]

CH2 Vibrational Frequencies Errors relative to experimental values Expt: Gu et al. J. Mol. Str. 2000, 517, 247.

Slices though IMLS-based PESs for CH2 Scan of one C-H bond distance with the other bond distance and angle held fixed at equilibrium values.

HCN-HCN: Highly Accurate PESs • We have generated a spectroscopically accurate PES for HCN-HNC for energies up to 35,000 cm-1 • Vibrational levels were computed using a discrete variable representation (DVR) method. • Details coming soon!

Concluding Comments  IMLS allows automated generation of PESs for various applications • Spectroscopy • Dynamics  Flexible fits to energies, energies and gradients, or higher derivatives…  Interfaced to electronic structure codes • Gaussian, Molpro, Aces II  Robust, efficient, practical methods that assures fidelity to the ab initio data  General PES fitter for 3-atom systems  Interfaced to general classical trajectory code: GenDyn

A General 3-Atom PES Fitter • Input file • Accuracy target • Energy range • Basis set • Number of seed points and coordinate ranges • Type of coordinates, Jacobi, valence, bond distances • Generates input files for Gaussian, MolPro, and Aces II • Energies only or energies & gradients

Automatic Construction of Ab Initio Potential Energy Surfaces