400 likes | 641 Views
Multi-Timescale Modeling and Quantum Chemistry using Machine-Learning Methods via Genetic Programs MCC Internal Review University of Illinois 7 November 2005. Faculty: Duane D. Johnson (MSE) , Pascal Bellon (MSE), David Goldberg (GE), Todd Martinez (Chemistry) Students:
E N D
Multi-Timescale Modeling and Quantum Chemistry using Machine-Learning Methods via Genetic ProgramsMCC Internal Review University of Illinois7 November 2005 Faculty: Duane D. Johnson (MSE), Pascal Bellon (MSE), David Goldberg (GE), Todd Martinez (Chemistry) Students: Kumara Sastry (MSE/GE), Alexis Thompson (Chemistry), Jia Ye (MSE) See also Poster
Background on Multi-Time-Scale Modeling • Growing interest in multi-timescale modeling • Restrictive and do not yield required speed-up • Hyperdynamics, Parallel replica (Voter, 1997,1998) • Focus on infrequent events • Use transition state theory • MD + KMC (Jacobsen, Cooper, & Sethna, 1998) • Elemental metals, tabulated activation barriers • Hybridize MD & KMC using genetic programming • Calculate some activation barriers using MD • Predict others using GP
Objective for Multi-Timescale Kinetic Modeling • Can we simulate experimental time scales for dynamics (up to seconds) for designing nanostuctured functional materials? –Time of realistic processes requires atomic-scale information, need frequent events (pico-secs) torare events (secs). –Infeasible to compute, e.g., barriers a priori or “on-the-fly”. – Possible configurations become potentially innumerable. – relative barrier heights control access and diffusion. • Propose anovel, effective & practicalmethod based onGenetic Programming (GP) for the intelligent machine learningof vast number of barrier values. • Offer Proof-of-Concept for long-time atomic diffusion in alloy – a hybrid of MD: nanoseconds 10–9 secs and (Kinetic MC): seconds. –UseMD to getsome diffusion barriers. –Use KMC to span 15 orders of time!, but need all barriers. –Use GP to regress all barriers from some barrier info. –Savings compared to Table Look-up is 4-8+ orders of magnitude.
RESULT: Time Multiscaling to Seconds using Genetic Programming n.n. jumps: 1st 2nd database K. Sastry, DD Johnson, DE Golderg, P. Bellon, Phys. Rev. B 72, 085438 (2005). Chosen by AIP editors as frontier research for the Virtual J. of Nanoscience (Aug, 2005) Time evolution of realistic processes requires atomic-scale data, but scales inaccessible via atomic simulations (only nano-seconds!). Ex: Vacancy-assisted Migration at Cu50Co50 (001) Surface using Kinetic Monte Carlo • Simulate seconds from KMC but need all barriers!(2nd n.n.:8192 barriers!) • Machine-learn barriers as regressedin-line functionE(c0,x) from few barriers . • GP needs < 3% of the barriers for < 0.1% error! • CPU savings (compared to Database look-up) is4-8+ orders of magnitude.
Objective for Using Multi-Objective Genetic Algorithms in Quantum Chemistry • Can we utilize concepts from multi-objective optimization theory (Pareto fronts) to create ab initio accurate semiemprical quantum chemistry potentials to dramatic speed-up searches over reaction pathways? • Offer Proof-of-Concept for Benzene and Ethylene. • Future: Propose usingGenetic Programming (GP) to machine-learn semiempirical potential form to improve reliability and speed.
RESULT:S1/S2 Conical Intersectionfrom Machine-Learned MNDO vs. ab initio CASSCF Benzene FC S2/S1 Intersection CASPT2 Dashed lines = target CASPT2 results only values/gradients at x=0 included in GA fitting GA-MNDO-PM3 Minimal Energy Intersections – Expected to play a prominent role in excited-state chemistry (nonadiabatic transitions). Red and Blue are g/h vectors – displacements which lift electronic degeneracy.
Genetic Programming Optimization forMulti-Timescale ModelingKinetic Diffusion.Cu-Co Vacancy-Assisted Surface Migration
David Goldberg (General Engineering) IlliGAL at University of Illinois Urbana-Champaign http://www-illigal.ge.uiuc.edu/ Studying nature's search algorithm of choice, genetics and evolution, as a practical approach to solving difficult problems on a computer. The mechanics of a genetic algorithm (GA) are conceptually simple: maintain a population of solutions coded as chromosomes, select the better solutions for recombination (crossover) of mating chromosomes. perform mutation and other variation operators on the chromosomes, and use these offspring to replace poorer solutions or to create a new generation. Theory and empirical results demonstrate that GAs lead to improved solutions in many problem domains, andwell-designed GAs can be guaranteed to solve a broad class of provably hard problems, quickly, reliably, and accurately.
Generally, using GAs depends on Problem • If gradient is numerically precise and fast, use gradient algorithm. • If there is an exact ground state, do notuse genetic algorithm. • GA Advantages • no need for knowledge or gradient info about object or energy surface. • discontinuities present on surface have little effect on optimization. • resistant to becoming trapped in local minima. • work well on large-scale optimization problems. • can be used on a wide variety of problems. • GA Disadvantages • trouble finding exact global minimum. • require a large number of cost function evaluations. • Starting/setting up configurations is not straightforward. • GAs require more evaluations to move uphill.
Recall Concept of Genetic Algorithms • Search based on principles of natural selection and genetics • Geneencodesolution: e.g., binary xn =0 or 1 for a variable where possible solutions are “gene” sequence {x1,…xN}. • Fitness (Objective) function: Quality measure of sequence. • Need known function to minimize (or maximize) to evaluate quality of genesequence, e.g., min. f(x) = | “cost” + “constraints”|, or max. f–1(x). • Population: A set of candidate gene sequences (solutions). • Genetic operators: • Selection: “Survival of the fittest” • Recombination: Combine parental traits to create offspring • Mutation: Modify an offspring slightly (local gene sequence change)
Example Use of Gas for Regression Ian Walmsley and Herschel Rabitz, “Quantum Physics Under Control,” Physics Today, August 2003. Example: Controlling laser shape pulses to increase yields in molecular reactions. R.J. Levis and H. Rabitz, J. Phys. Chem. A 106, 6427 (2002). • Controller is based on a GA to shape (phase and amplitude) pulses. • GA evolves trying to maximize mass spectrometer signal of desired molecular species. • thousands of pulse updated per second.
Same Concept for Genetic Programming • Genetic Algorithms that evolve computer programs (Koza, 1992) • Representation:Programs are represented by trees • Functions: Internal nodes (eg., {+, –, *, sin, cosh, log}) • Terminals: Leaf nodes (eg., {x, y, 2.3, R}) • R=random ephemeral constant. • Fitness function: Quality measure of the program • Population: Candidate programs (individuals) • Genetic operators: • Selection: “Survival of the fittest” • Recombination: Combine parental traits to create offspring • Mutation: Modify an offspring slightly
Example GP Function and Tree • Define Functions: { x, y, z, sin, +, *, ^, ADF1, …} • ADF “automatically defined functions” can be learned. • Function Example:Add Double f_add (arg1, arg2) { return arg1 + arg2; } • Example TREE: where x,y,z U=[a,b] Internal node=fct Leaf node
Approach for Surface Diffusion Total alloy configuration Coefficients • Efficient Coupling of MD and KMC • Multi-timescale modeling of alloys • Predict entire potential energy (PE) surface using a few exact PE calculations • PE surface = • Don’t know the functional from of f ! e.g., not simple basis, could be product of plane-wave & polynomial • Use symbolic functional regression via GP • Search for the regression function, f • Optimize the values of coefficients, co • Form of intelligent machine learning (like Bayesian, neural nets, etc.) http://gold.cchem.berkeley.edu:8080/research_path.html
GP for Predicting PE Surface Energy barrier predicted by GP Weights (More importance to lower energy barriers) • Encode: Alloy configuration unique number • Decision variables: • Function set: {+, -, *, /, exp, sin, ^} • Terminal set: • Binary A-B alloy: xi= {0,1}: xi is (is not) A for neighbors (e.g., 1st and 2nd) occupation for vacancy/atom pair costing E. • Atomistic:ComputeE for some subset of configurations • Object Function: Minimize absolute error in prediction
Quick Overview of the MD, MC, KMC Methods • Molecular Dynamics: F = ma • Real dynamics but for short times (~10–9 secs). • Many realistic processes are inaccessible. • Must run long. But can calculate anything. • Monte Carlo: Sampling, “dumb and blind” method. • Acceptance probability pi,f = min[1, exp(–Ei,f/kBT)] • Need to calculation each Ei,f = barrier height to change states. • Time-evolution: not real time (Monte Carlo step, MCS), unless MD or experiment has provide relation of MCS to real time for all events. • Kinetic Monte Carlo: assumes Poisson process, know all Ei,f. • Hence frequency of events (rate) relative to most frequent (known) event with smallest time span tshortest, e.g. MD or experiment. • KMC (~secs) but need alljump frequenciesa priori. • KMC steps inREAL TIME and ALL EVENTS accepted!
Generic Time Enhancements of Table KMC – • GP-predicted PES facilitates use of kinetic Monte Carlo • Real time in KMC (Fichthorn & Weinberg, 1991) • Speed-up over MD: • ~109 at 300 K • ~105 at 550 K • ~103 at 900 K • Less CPU time over MD
Potential Energy Surface: Vacancy-assisted Migration for fcc Elemental and Binary Alloy Jumps Active n.n. jumps: 1st 2nd y z x x } Boisvert & Lewis, Phys. Rev. B.56 (1997) (Present work) Fixed layers n.n. configs.: 1st 2nd Co Cu Vacancy • Computational Method • Potentials: – Empirical: Morse – Quantum: Tight-binding • System size: 5 layers, >>100 atoms/layer • Consider 1st and 2ndnearest-neighbor (n.n.) jumps • Local (active) configs.: 1st and 2nd n.n. environments • Consider rigid (fully relaxed) atoms to calculate DE • Energy Test: pure Cu, n.n. jumps only • Morse potential: DE = 0.39 eV (present work) • ab initio : DE=0.420.08 eV • EAM : DE=0.470.05 eV • TB : DE=0.450.05 eV • Complex case: Segregating fcc CuxCo1-x
GP Optimized Regression for PE Surface: (001) Vacancy-assisted Migration The Machine-Learned In-Line Barrier Fct.
GP Optimized Regression for PE Surface: (001) Vacancy-assisted Migration While Non-Linear Function is complicated, you do not care – give accurate barriers, otherwise it has no meaning!
PES Predictions: (001) Vacancy-assisted Migration • 1st n.n. active configuration (128 total) for simplicity • Atoms are either rigid or fully relaxed. • Simple regression fails: Quadratic (Cubic) Polynomial Regression needs 27% (78%) of the configurations • GP needs PE calculation for only 20 configurations (or 16%)
PES Predictions: (001) Vacancy-assisted Migration • Total 2nd n.n. active configurations: 8192 • GP needs (E calculated): < 3% (256) configurations • Low energy migrations: < 0.1% prediction error • Overall events: < 1% error
Symbolically-Regressed KMC (sr-KMC) • From a few (as needed) exact calculations use symbolic functional regression via GP to predict the entire PE surface from an in-line function f(c0,x). • Search for the regression function, f , optimize the values of coefficients, co • Form of intelligent machine learning (like Bayesian, neural nets, etc.)
Time Enhancements from sr-KMC over standard Table KMC • GP needs (DE calculated): – < 3% (256) configurations (33 times fewer barrier). – Using Cluster-expansion techniques 0.3% (330 times fewer)! • Low energy migrations: < 0.1% prediction error. • GP yields in-line barrier function: ~100 x faster than table look-up. • Compared to “on-the-fly” calculations, sr-KMC is 104 -107 faster! – in-line function call ~10–3 secs per barrier. – Empirical potential ~10 secs per barrier. – Tight-binding potential ~1800 secs per barrier. – first-principles potential even greater. • How does gain scale with complexity? – For present problem, the number of barriers required decreases with complexity of configuration space. PROMISING!
Multi-Objective GA Optimization of Semi-Empirical Quantum Chemistry Potentialswith ab initio accuracy.BENZENE
Benzene Reparameterization • Target data:Ab Initio values via CASSCF(6/6)(Toniolo et al 2004) • ground (S0) and excited-state energies (S1, optical dark, and S2, allowed) (planar benzene, Dewar Benzene, benzvalene, prefluvene). • excited-state gradients at these points.(Franck-Condon region) • Semi-empirical potential:MNDO-PM3 (Stewart,1989) • 11 Carbon parameters to optimize (fix hydrogen or core-core repulsion) Uss,Upp (Coulombic); Gss, Gsp, Gpp, Gpp` (repulsion); Hsp (exchange); s, p (resonance); s, p (Slater orbital exponents) -Modified Neglect of Differential Overlap • Objectives: #1: Error inexcited-stateenergies and geometries #2: Error inexcited-state gradients
177.7º [136.0º] (136.3º) 1.529 [1.498] (1.526) 1.606 [1.601] (1.587) 1.481 [1.523] (1.530) 1.533 [1.524] (1.546) 1.430 [1.481] (1.457) 1.371 [1.401] (1.404) 1.335 [1.348] (1.347) 1.458 [1.496] (1.501) 1.469 [1.509] (1.512) 116.1º [116.8º] (116.5º) 1.340 [1.335] (1.350) Prefulvene (c) Dewar (b) Benzvalene (a) 2.395 [2.454] (2.456) 1.857 [1.942] (1.945) 170.2º [131.4º] [129.7º] 1.384 [1.461] (1.458) (1.476) (1.387) 1.367 [1.395] (1.397) 1.407 [1.475] (1.441) 1.422 [1.466] (1.462) 1.359 [1.372] (1.408) S1-S0 CI (d) S2-S1 CI (e) Figure 1. Ground state optimized geometries and important minimal energy conical intersections for benzene. Bond lengths (Å) and angles (degrees) from RPM3, CASSCF(6/6)/6-31* (in parentheses), and CASSCF(6/6)*PT2/6-31* optimizations (in brackets) are shown.
Benzene Reparameterization • Semi-empirical potential:MNDO-PM3 (Stewart,1989) • 11 Carbon parameters to optimize • Objectives: #1:Error in excited-stateenergies and geometries • #2:Error in excited-state gradients • Not obvious how to weight accuracy in energy compared to accuracy in gradients. • Multi-objective GA with bias solves the problem! • Use non-dominated sorting GA II (NSGA-II)(Deb et al., 1999) • Competence and efficiency can be further enhanced by data mining important problem substructures (building blocks)
Reparameterization of Semi-Empirical Potentials: Multiobjective Optimization Approach O O * O • Reparameterization of SE-forms involves multiple objectives fit of limited set of ab initio energies, geometries, and energy gradients. • Previous Approach: Sequential weighted local optimization • Yields unphysical potentials, Results in local optima, Depends entirely on the weights on different objectives Multiobjective optimization • Simultaneously obtain set of non-dominated (Pareto optimal) solutions in parallel. • Avoid potentially irrelevant and unphysical pathways, arising from SE-forms.
Genetic Algorithm Multiobjective Optimization • Unlike single-objective problems, multi-objective problems involve a set of Pareto-optimal solutions. • Notion of Non-Dominating Solutions • A dominates C. • A and B are non-dominant. • Solution X dominates Y if: • X is no worse than Y in all objectives • X is strictly better than Y in at least one objective
Why use Multiobjective Genetic Algorithms? • Robust search algorithms that yield good quality solutions quickly, reliably, and accurately • Rapidly converge to the Pareto optimal front • Maintain as diverse a distribution of solutions as possible • Population approach suits well to find multiple solutions • Niche-preservation methods can be exploited to find diverse solutions • Implicit parallelism helps provide a parallel search • Multiple applications of classical methods do not constitute a parallel search
Multiobjective GA Results: Unbiased vs. Biased Tonilo et al (2004) • Bias = Weight error in energy 2x more than in energy gradient. • (Un)Biased solutions are consistently better than the published result. • Pareto-optimal solutions are physical! • 37% lower gradient error • 33% lower energy error • Biasing: - convergence 2-3 times faster - improves solution quality - finds physical solutions.
S1/S2 Conical Intersection FC S2/S1 Intersection CASPT2 Dashed lines = target CASPT2 results (only values and gradients at x=0 included in GA fitting) GA Biased Minimal Energy Intersections – Expected to play a prominent role in excited state chemistry (nonadiabatic transitions). Red and Blue are g/h vectors – displacements which lift electronic degeneracy.
S1/S2 Branching Plane CASPT2 GA Biased
Potentially creates “transferable” potentials Benzene parameters compares to MO-GA for Ethelyne C2H4
Mathematical Analysis of GP • Population size • Very important parameter for GP performance • Currently no guidance to choose population size • Building-Block Supply Analysis • What population-size should be used?
Summary • Symbolic regression via GP holds promise for application to numerous areas of science and engineering. • GP mathematical analysis required to determine adequate population sizes, etc. • Case: Surface-vacancy assisted migration in CuxCo1-x • Dramatic scaling in time over Table-Look-Up KMC • Requires small subset of PE surface information. • Case: Constitutive Behavior of Aluminum AA7055 • AA 7055 found strain-rate dependence without ‘a priori’ knowledge. • Case: Reparamaterization of Semi-Empirical Potentials • Multiobjective GA yields accurate potentials. • Can GP’s help with better forms of potential? • This is POTENTIALLY the most exciting applications area.
Future Work • Algorithm Development: • Competent operators to handle complex interactions • Mathematical analysis of GP: • Population-sizing and convergence-time models • Engineering & Scientific Application: • More complex systems: Adatoms, line and planar defects • Application in excitation chemistry (Forms of potentials?) • Algorithm Efficiency Enhancement: • Parallelization of GP • Hybridize GP with cluster-expansion methods • Reduce the configurations that need PE calculation