180 likes | 308 Views
JavaGenes Evolving Molecules and Molecular Force Fields. Al Globus Deepak Srivastava Sandy Johan A Work In Progress. Molecules to Evolve. Graph Crossover Problem. Any edge may be a member of one or more cycles.
E N D
JavaGenesEvolving Molecules and Molecular Force Fields Al Globus Deepak Srivastava Sandy Johan A Work In Progress
Graph Crossover Problem • Any edge may be a member of one or more cycles. • Graph fragments produced by division may have more than one crossover point ("broken edges") • When two fragments are combined they may have different numbers of broken edges to be merged. • Our crossover operator • Operate on any connected graph. • Divides graphs at randomly generated cut sets. • Can evolve arbitrary cyclic structures given at least some cycles in the initial population. • Always produces connected undirected graphs. • Almost always produces connected directed graphs.
Crossover Graphs Strings Trees abcd wxyz abcd wxyz abyz wxcd
Graph Crossover Combine into a Child Rip Two Parents Apart
Molecule Division • Choose an initial random bond • Repeat • Find the shortest path between the initial bond's atoms. • Remove and remember a random bond from this path. These bonds are called "broken edges.“ • Until a cut set is found, i.e., no path exists between the initial bond's vertices.
Fragment Recombination • Repeat • Select a random broken edge. Determine which fragment it is associated with. • If at least one broken edge in other fragment exists • choose one at random • merge the broken edges into one bond; respecting valence by reducing the order of the bond if necessary • Else flip coin • heads -- attach the broken edge to a random atom in other fragment (respecting valence) • tails -- discard the broken edge • Until each broken edge has been processed exactly once
Molecule Fitness Function All-pairs-shortest-path distance • Assign extended types to each atom • Extended type = (element, |single bonds|, |double bonds|, |triple bonds|) • Find shortest bond path between each pair of atoms • Create bag: one item per atom pair • item = (type1, type2, path length) • bag = set with repeated items • distance = 1 - |intersection| / |union|
JavaGenes in Action Finding with all-pairs-shortest-path and Tanimoto index fitness function (0 is perfect)
Molecular Dynamics and Mechanics • Newton’s laws of motion in a potential field • Discover common conformations during dynamics • Discover minimum energy conformations (e.g., protein folding problem) • Began in 1960s with two body potentials for inert gas modeling • 1980s extended to metals and bonded systems (upper-right corner of periodic table) • Our studies focus on the evolving potentials for reactive systems (bonds break and form)
Molecular Potentials • Energy = sum 2-body terms + sum 3-body terms + … • Stillinger-Weber SiF potential function • 2-body(r) • A(Br-p - r-q) * cutoff • Cutoff = exp(C/(r-a)); r < a, 0 otherwise • 3-body(rij,rjk,theta) = • (alpha + lambda (cos(theta) - cos(theta0))^2)) * cutoff • Cutoff = exp(gamma(1/(rij- a1) + 1/(rjk- a1)) • FFF additional term = • delta(rijrjk)-m * cutoff • Cutoff = exp(beta(1/(rij - a2) + 1/(rjk- a2))) • Discovering parameters can require months or years
Evolving Molecular Force Fields • Chromosome • 2D ragged array of floating point numbers • SiSi, SiF, FF, SiSiSi, SiSiF, SiFSi, FSiF, FFSi, FFF • 5-63 parameters • Transmission operators • Interval crossover • Mutation • Fitness Function • RMS difference between individuals and “correct” energies for n molecules • “Correct” energies • Currently: energies generated with the force field with published parameters • Next step: energies generated by higher quality quantum codes
Interval Crossover • For each allele: Construct an interval from parental values 1. Lower Parental Value (1.1) Higher Parental Value (2.1) Construct larger interval (100% larger) 2. (.6) (2.6) Choose a random number 3. (1.3)
Si potential results • population = 1000 • generations = 3000 • fitness function: 100 random 5-body Si tetrahedra • 31 runs. Best run results: • A = 7.151346144801161 (7.049556277) • B = 0.6007865398735448 (0.6022245584) • p = 3.9825158463763977 (4) • q = 0.014970062068368135 (0) • a = 1.797123919332413 (1.8) • alpha = 0.1442970771852687 (0) • lambda = 27.783092740584205 (21) • gamma = 1.328091763076223 (1.2) • a1 = 1.8173559091012945 (1.8)
Future Plans • Hill climbing • Use experimental data for new fitness functions • Feed results from easy to hard evolution SiF (6) SiSi (5) FF (6) SiFSi (10) FSiF (10) SiSiF (10) FFSi (10) SiSiSi (9) FFF (14) Full SiF (63)
Condor • Cycle-scavenging batch system for single workstation jobs • Desktop machines, nights, weekends, etc. • University of Wisconsin • In production since 1986 • Unix workstations • 250 SGI and 50 Sun workstations at code IN • Good for • parameter studies • stochastic algorithms (e.g., GA) • One JavaGenes job per Condor job