1.11k likes | 1.25k Views
An improved hybrid Monte Carlo method for conformational sampling of proteins. Jesús A. Izaguirre and Scott Hampton Department of Computer Science and Engineering University of Notre Dame March 5, 2003
E N D
An improved hybrid Monte Carlo method for conformational sampling of proteins Jesús A. Izaguirre and Scott Hampton Department of Computer Science and EngineeringUniversity of Notre Dame March 5, 2003 This work is partially supported by two NSF grants (CAREER and BIOCOMPLEXITY) and two grants from University of Notre Dame
Overview 2. Methods for sampling (MD, HMC) 1. Motivation: sampling conformational space of proteins 3. Evaluation of new Shadow HMC 4. Future applications
Protein: The Machinery of Life NH2-Val-His-Leu-Thr-Pro-Glu-Glu- Lys-Ser-Ala-Val-Thr-Ala-Leu-Trp- Gly-Lys-Val-Asn-Val-Asp-Glu-Val- Gly-Gly-Glu-…..
Why protein folding? • Huge gap: sequence data and 3D structure data • EMBL/GENBANK, DNA (nucleotide) sequences 15 million sequence, 15,000 million base pairs • SWISSPROT, protein sequences120,000 entries • PDB, 3D protein structures20,000 entries • Bridging the gap through prediction • Aim of structural genomics: • “Structurally characterize most of the protein sequences by an efficient combination of experiment and prediction,” Baker and Sali (2001) • Thermodynamics hypothesis: Native state is at the global free energy minimumAnfinsen (1973)
Questions related to folding I • Long time kinetics: dynamics of folding • only statistical correctness possible • ensemble dynamics • e.g., folding@home • Short time kinetics • strong correctness possible • e.g., transport properties, diffusion coefficients
Questions related to folding II • Sampling • Compute equilibrium averages by visiting all (most) of “important” conformations • Examples: • Equilibrium distribution of solvent molecules in vacancies • Free energies • Characteristic conformations (misfolded and folded states)
Overview 2. Methods for sampling (MD, HMC) 1. Motivation: sampling conformational space of proteins 3. Evaluation of new Shadow HMC 4. Future applications
Classical molecular dynamics • Newton’s equations of motion: • Atoms • Molecules • CHARMM force field(Chemistry at Harvard MolecularMechanics) Bonds, angles and torsions
What is a Forcefield? In molecular dynamics a molecule is described as a series of charged points (atoms) linked by springs (bonds). To describe the time evolution of bond lengths, bond angles and torsions, also the nonbond van der Waals and elecrostatic interactions between atoms, one uses a forcefield. The forcefield is a collection of equations and associated constants designed to reproduce molecular geometry and selected properties of tested structures.
Energy Terms Described in the CHARMm forcefield Bond Angle Dihedral Improper
Energy Functions Ubond = oscillations about the equilibrium bond length Uangle = oscillations of 3 atoms about an equilibrium angle Udihedral = torsional rotation of 4 atoms about a central bond Unonbond = non-bonded energy terms (electrostatics and Lennard-Jones)
Molecular Dynamics –what does it mean? MD = change in conformation over time using a forcefield Energy Energy supplied to the minimized system at the start of the simulation Conformation impossible to access through MD Conformational change
MD, MC, and HMC in sampling • Molecular Dynamics takes long steps in phase space, but it may get trapped • Monte Carlo makes a random walk (short steps), it may escape minima due to randomness • Can we combine these two methods? MD MC HMC
Hybrid Monte Carlo • We can sample from a distribution with density p(x) by simulating a Markov chain with the following transitions: • From the current state, x, a candidate state x’ is drawn from a proposal distribution S(x,x’). The proposed state is accepted with prob.min[1,(p(x’) S(x’,x)) / (p(x) S(x,x’))] • If the proposal distribution is symmetric, S(x’,x)) = S(x,x’)), then the acceptance prob. only depends on p(x’) / p(x)
Hybrid Monte Carlo II • Proposal functions must be reversible: if x’ = s(x), then x = s(x’) • Proposal functions must preserve volume Jacobian must have absolute value one • Valid proposal: x’ = -x • Invalid proposals: • x’ = 1 / x (Jacobian not 1) • x’ = x + 5 (not reversible)
Hybrid Monte Carlo III • Hamiltonian dynamics preserve volume in phase space • Hamiltonian dynamics conserve the Hamiltonian H(q,p) • Reversible symplectic integrators for Hamiltonian systems preserve volume in phase space • Conservation of the Hamiltonian depends on the accuracy of the integrator • Hybrid Monte Carlo: Use reversible symplectic integrator for MD to generate the next proposal in MC
HMC Algorithm Perform the following steps: 1. Draw random values for the momenta p from normal distribution; use given positions q 2. Perform cyclelength steps of MD, using a symplectic reversible integrator with timestep t, generating (q’,p’) 3. Compute change in total energy • H = H(q’,p’) - H(q,p) 4. Accept new state based on exp(- H )
Hybrid Monte Carlo IV Advantages of HMC: • HMC can propose and accept distant points in phase space, provided the accuracy of the MD integrator is high enough • HMC can move in a biased way, rather than in a random walk (distance k vs sqrt(k)) • HMC can quickly change the probability density
Hybrid Monte Carlo V • As the number of atoms increases, the total error in the H(q,p) increases. The error is related to the time step used in MD • Analysis of N replicas of multivariate Gaussian distributions shows that HMC takes O(N5/4)withtime step t = O(N-1/4) Kennedy & Pendleton, 91
Hybrid Monte Carlo VI • The key problem in scaling is the accuracy of the MD integrator • More accurate methods could help scaling • Creutz and Gocksch89 proposed higher order symplectic methods for HMC • In MD, however, these methods are more expensive than the scaling gain. They need more force evaluations per step
Overview 2. Methods for sampling (MD, HMC) 1. Motivation: sampling conformational space of proteins 3. Evaluation of new Shadow HMC 4. Future applications
Improved HMC • Symplectic integrators conserve exactly (within roundoff error) a modified Hamiltonian that for short MD simulations (such as in HMC) stays close to the true Hamiltonian Sanz-Serna & Calvo 94 • Our idea is to use highly accurate approximations to the modified Hamiltonian in order to improve the scaling of HMC
Shadow Hamiltonian Work by Skeel and Hardy, 2001, shows how to compute an arbitrarily accurate approximation to the modified Hamiltonian, called the Shadow Hamiltonian • Hamiltonian: H=1/2pTM-1p + U(q) • Modified Hamiltonian: HM = H + O(t p) • Shadow Hamiltonian: SH2p = HM + O(t 2p) • Arbitrary accuracy • Easy to compute • Stable energy graph • Example, SH4 = H – f( qn-1, qn-2, pn-1, pn-2 ,βn-1 ,βn-2)
Shadow HMC • Replace total energy H with shadow energy • SH2m = SH2m (q’,p’) – SH2m (q,p) • Nearly linear scalability of sampling rate Computational cost SHMC, N(1+1/2m),where m is accuracy order of integrator • Extra storage (m copies of q and p) • Moderate overhead (25% for small proteins)
Front-end libfrontend Middle layer libintegrators back-end libbase, libtopology libparallel, libforces ProtoMol: a framework for MD Matthey, et al, ACM Tran. Math. Software (TOMS), submitted Modular design of ProtoMol (Prototyping Molecular dynamics). Available at http://www.cse.nd.edu/~lcls/protomol
SHMC implementation • Shadow Hamiltonian requires propagation of β • Can work for any integrator
Sampling Metric 1 • Generate a plot of dihedral angle vs. energy for each angle • Find local maxima • Label ‘bins’ between maxima • For each dihedral angle, print the label of the energy bin that it is currently in
Sampling Metric 2 • Round each dihedral angle to the nearest degree • Print label according to degree
Sampling rate comparison • Cost per conformation is total simulation time divided by number of new conformations discovered (2mlt, dt = 0.5 fs) • HMC 122 s/conformation • SHMC 16 s/conformation • HMC discovered 270 conformations in 33000 seconds • SHMC discovered 2340 conformations in 38000 seconds
Conclusions • SHMC has a much higher acceptance rate, particularly as system size and timestep increase • SHMC discovers new conformations more quickly • SHMC requires extra storage and moderate overhead. • SHMC works best at relatively large timesteps
Future work • Multiscale problems for rugged energy surface • Multiple time stepping algorithms plus constraining • Temperature tempering and multicanonical ensemble • Potential smoothing • System size • Parallel Multigrid O(N) electrostatics • Applications • Free energy estimation for drug design • Folding and metastable conformations • Average estimation
Acknowledgments • Dr. Thierry Matthey, lead developer of ProtoMol, University of Bergen, Norway • Graduate students: Qun Ma, Alice Ko, Yao Wang, Trevor Cickovski • Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr. Christoph Schutte for valuable discussions
Multiple time stepping • Fast/slow force splitting • Bonded: “fast” • Small periods • Long range nonbonded: “slow” • Large characteristic time • Evaluate slow forces less frequently • Fast forces cheap • Slow force evaluation expensive
The Impulse integrator Grubmüller,Heller, Windemuth and Schulten, 1991 Tuckerman, Berne and Martyna, 1992 • The impulse “train” Fast impulses, t Time, t Slow impulses, t How far apart can we stretch the impulse train?
Stretching slow impulses • t ~ 100 fs if accuracy does not degenerate • 1/10 of the characteristic time MaIz, SIAM J. Multiscale Modeling and Simulation, 2003 (submitted) • Resonances (let be the shortest period) • Natural: t = n , n = 1, 2, 3, … • Numerical: • Linear: t = /2 • Nonlinear: t = /3 MaIS_a, SIAM J. on Sci. Comp. (SISC), 2002 (in press) MaIS_b, 2003 ACM Symp. App. Comp. (SAC’03), 2002 (in press) Impulse far from being multiscale!
3rd order nonlinear resonance of Impulse MaIS_a, SISC, 2002;MaIS_b, ACM SAC’03, 2002 Fig. 1: Left: flexible water system. Right: Energy drift from 500ps MD simulation of flexible water at room temperature revealing 3:1 and 4:1 nonlinear resonance (3.3363 and 2.4 fs)
Overview • Introduction • Molecular dynamics (MD) in action • Classical MD • Protein folding • Nonlinear instabilities of Impulse integrator • Approximate MD integrators • Targeted MOLLY • MUSICO • Applications • Summary • Future work • Acknowledgements
Objective statement • Design multiscale integrators that are not limited by nonlinear and linear instabilities • Allowing larger time steps • Better sequential performance • Better scaling
Overview • Introduction • Molecular dynamics (MD) in action • Classical MD • Protein folding • Nonlinear instabilities of Impulse integrator • Approximate MD integrators • Targeted MOLLY • MUSICO • Applications • Summary • Future work • Acknowledgements
Targeted MOLLY (TM) MaIz, 2003 ACM Symp. App. Comp. (SAC’03), 2002 (in press) MaIz, SIAM J. Multiscale Modeling and Simulation, 2003 (submitted) • TM = MOLLY + targeted Langevin coupling • Mollified Impulse (MOLLY) to overcome linear instabilities Izaguirre, Reich and Skeel, 1999 • Stochasticity to stabilize MOLLY Izaguirre, Catarello, et al, 2001
Mollified Impulse (MOLLY) • MOLLY (mollified Impulse) • Slow potential at time averaged positions, A(x) • Averaging using only fastest forces • Mollified slow force = Ax(x) F(A(x)) • Equilibrium and B-spline • B-spline MOLLY • Averaging over certain time interval • Needs analytical Hessians • Step sizes up to 6 fs (50~100% speedup)
Vi Vj FRi, FDi FRj = - FRiFDj = - FDi Introducing stochasticity • Langevin stabilization of MOLLY (LM) Izaguirre, Catarello, et al, 2001 • 12 fs for flexible waters with correct dynamics • Dissipative particle dynamics (DPD): Pagonabarraga, Hagen and Frenkel, 1998 • Pair-wiseLangevin force on “particles” • Time reversible if self-consistent