1 / 78

Exploring Algorithm Space Variations on the Exchange Theme

Exploring Algorithm Space Variations on the Exchange Theme. Daniel M. Zuckerman Department of Computational Biology School of Medicine University of Pittsburgh. Goal. More efficient atomistic sampling, consistent with statistical mechanics Take care with the meaning of “ efficiency ”.

adina
Download Presentation

Exploring Algorithm Space Variations on the Exchange Theme

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Algorithm SpaceVariations on the Exchange Theme Daniel M. Zuckerman Department of Computational Biology School of Medicine University of Pittsburgh

  2. Goal • More efficient atomistic sampling, consistent with statistical mechanics • Take care with the meaning of “efficiency”

  3. Outline • Protein fluctuations in biology • Replica exchange simulation -- a second look • Resolution exchange simulation • Initial results • How to approach larger systems? • Exchange Variants • Assessing Sampling

  4. Transport Proteins Fluctuate - I

  5. Transport Proteins Fluctuate - II

  6. Motor Proteins Fluctuate

  7. Signalling Proteins Fluctuate

  8. free bound ligand ligand bound free Conformational Change Requires Fluctuation • Either ligand leaves free-like bound structure or ligand binds bound-like free structure (or nearly so)

  9. Biology Take-Home Message • Fluctuations are ubiquitous and essential • They are not a sideshow; they are the show! • Experimental structures are only snapshots -- just the beginning of the story

  10. Key for medicinal chemists especially • Drug design via “docking” is a key practical use of molecular modeling • Typically, drug candidate molecules are fitted into static protein structures • Common lament: need to know protein fluctuations • Necessary for free energy calculations • e.g., binding affinity

  11. 1 - 1.5 Å RMSD time Questioning low RMSD in MD • Is 1.3 Å right? What is nature’s avg RMSD???

  12. U x A Physical View of Fluctuations • Rough, high-dimensional energy landscape

  13. x U t p x x Simplest Physical Picture: Bistable system • Most phenomena can be understood from a toy picture

  14. Defining the Problem • We want a good sample of p(x) • “Equilibrium distribution” • “Complete canonical ensemble” • Probability density function • x is a vector in configuration space -- i.e., vector of all coordinates: (x1,y1,z1, x2,y2,z2, …) • In English: We want a set of structures distributed according their probability of occurrence at the specified temperature • Hard because we access p(x) only indirectly • Blind person feeling elephant

  15. It’s NOT optimization/search/minimization! • However, undiscovered sampling algorithms may be similar to search algorithms!

  16. The Problem with the Problem • It’s too hard!! • Present methods, implemented on standard computers, are inadequate by orders of magnitude -- think timescales • Simulations access nsec - msec timescales • Proteins fluctuate on nsec - sec timescales • 3-9 orders of magnitude short! • Today: taking steps toward the solution

  17. U l3 q2 l2 q1 l1 l1 l10 Theoretical/Computational Basics • Boltzmann factor • “Forcefield” (potential energy function) • Configuration vector to real number • Terms not shown: sterics, electrostatics, four-body (e.g., dihedral)

  18. U U x x Exchange Schemes • Original idea: use higher temperature to facilitate barrier crossing [Swendsen, 1986] • Barriers are the real problem • Arrhenius law: • rate ~ barrier’s Boltz. fac. DUfwd

  19. hot 300K t Exchange attempts Exchange Ladder • High-temperature hops percolate down via configuration swaps ( temperature swaps) • Independent sim’s with occasional exchange attempts T

  20. How does replica exchange work? • It’s just Monte Carlo • Physics view of Metropolis • Accept trial move: xold xtry with min[1,exp(-DU/kT)] • DU=U(xtry) - U(xold) • Probability view: • Accept with min[1, prob(try)/prob(old)]

  21. hot time 300K T2 T1 Exchange as simple Monte Carlo • Exchanges are only attempted in pairs • Two independent simulations • Probability for combined system is simple product: p = p1*p2 • Metropolis criterion: min[1, ptry / pold]

  22. Does replica exchange really help? • For a given investment of CPU time, is better fixed-Tsampling achieved? • Compared to equal time direct simulation -- e.g., for a 20-level ladder, a simulation 20 times as long • To my knowledge, no convincing evidence yet • Key:Sampling limited by top level • Worry 1: High T does not help with entropic barriers • Hard-to-find low energy pathways • Worry 2: High T not so helpful for low barriers • Simulations and experiments suggests barriers are low • Even for 600K simulation, only moderate speedup • 2kT 2.7 speedup • 4kT 7.4 speedup • 6kT 20.1 speedup

  23. Summary of Concerns re Replica Exchange • Efficiency limited by top level (highest T) • Highest T may not be fast enough for biomolecules • High T does not affect entropic barriers • Energy barriers may be low • Should work for sufficiently high energy barriers

  24. Can replica exchange be fixed? • Yes • Two improvements today • Plus a sketch of other variants

  25. hot hot U time 300K 300K x Improvement (1): Pseudo-exchanges • Key: Need complete sampling top level (highest T) • Work from top down …if we can “pseudo exchange” Top level can be generated with multiple simulations

  26. fast slow Anatomy of a Pseudo-Exchange • Point 1: Normal exchanges need not be performed at identical intervals • Not required in derivation of Metropolis criterion • Imagine one fast CPU & one slow CPU • Point 2: Imagine top-level CPU is extremely fast • Long intervals  no correlations  equil. dist. • Alternatively, view top level as “perfect” Monte Carlo  equil. dist. • Conclusion: no need to continue top-level sim. from exchanged configuration  can pull randomly each time from top level

  27. hot hotter! time 300K 300K Two Ways to Use Pseudo Exchange • Same ladder • More widely spaced ladder • Lower acceptance OK since trials are cheap (serial) • No need for frequent attempts in parallel since few high T hops • Essentially guaranteed to be more efficient than standard parallel replica exchange.

  28. Top-down test: Di-leucine Peptide • Two amino-acid peptide with two main conformations • 50 atoms (144 degrees of freedom) • Langevin dynamics; GBSA continuum solvent model • ALL SIMULATIONS

  29. b T=500K a T=500K, shuffled Example: Di-leucine via two-level ladder • Di-leucine, a 50-atom peptide: two levels only T=298K using pseudo-exchanges with shuffled 500K trajectory

  30. T=500K T=298K Not really efficient • Boost to 500K only modestly increases hop rate • In 300nsec: 488 hops at 500K vs. 300 at 298K • Barriers are too low • Ordinary trajectories shown (no exchange) • Still should be better than parallel exchange sim.

  31. Improvement (2): Resolution Exchange • Canonical sampling in detailed model Coarse Detailed

  32. Dreams of multi-scale modeling • (At least) since Levitt and Warshel, Nature (1975) • Warshel -- free energy for detailed model based on coarse-grained reference (1999) • Brandt and collaborators -- complex multi-level formulation • Vendrusculo and coworkers -- ad hoc addition of atomic detail onto coarse structures • Resolution exchange is concrete, simple and general

  33. COARSE detailed time Exchange attempts Improvement (2): Resolution Exchange • Qualitative picture

  34. f2 l3 f1 q2 l2 q1 l1 Implementing Resolution Exchange • Need • Formulate as exchange process • Derive acceptance criterion • Coarse model will use subset • Detailed (regular) model x = (l1,l2,l3, …, q1, q2, …, f1,f2, …) • Coarse model is subset, e.g., f = (f1,f2, …) • Arbitrary potential Ucoarse(f) -- i.e., pcrs(f) = exp[- Ucoarse(f) / kT] • Simply exchange common coords.

  35. f2 l3 f1 q2 b l2 q1 b b l1 b Key Point: Subsets are natural for coarse models • Examples • Dihedrals only (fixed angles, lengths) • Backbone coordinates only • Side-chains by beta carbons • Proteins are branched chains

  36. time coarse detailed Res-Ex Metropolis Criterion • The trial exchange • From: (la,qa,fa) and fb [“old”] • To: (la,qa,fb) and fa [“try”] • Metropolis: min[1, ptot(try) / ptot(old)] • Final criterion • min[1,R] CANONICAL SAMPLING FOR ALL COORDS, ALL LEVELS!!!

  37. COARSE detailed time Exchange attempts Downside of Res-ex: more work! • The ladder needs to be engineered • Analogy to replica exchange: limit on difference between models • simple solution (later) • Implicit solvent: still hard and important problem

  38. You can recycle! • Top-down approach (pseudo-exchanges) permits old trajectories to be exchanged into new • New temperature • New forcefield • Same or different numbers of coordinates • Minimal CPU cost, if original trajectory already crossed barriers

  39. Initial Results • Still early stages • Verifying the algorithm • Efficiency in a 50-atom di-peptide • [A penta-peptide] • Reduced models of proteins are reasonable

  40. = central dihedral Line is from direct sim. Algorithm Check: Butane • Butane is C4H10

  41. Real Molecular Test: Di-leucine Peptide • Two amino-acid peptide with two main conformations • Exchange all-atom to united-atom (GBSA “solvent”) • eliminate non-polar H • 50 atoms to 24 “united atoms” united atom

  42. Initial Results: Res-ex really works • CPU Savings: Factor of 15 (including united-atom cost)

  43. From long brute-force sim. Leucine free energy difference via Res-Ex • DGab measures if correct time spent in each state • Increased precision indicates speedup (first report??) • Cost of united-atom simulation included in graph

  44. Comments • Results obtained from a two-level ladder • Faster sampling should be possible with more levels • Requires forcefield engineering • Can use higher temperature also • AND/OR softer parameters

  45. Spin Systems Too • Absolute spins • … or block spins as coarse variables () • Relative spins as detailed coordinates (+–)

  46. How do we progress from here? • Need an exchangeable ladder • But we have design criteria • Top level needs to explore important fluctuations

  47. A Possible Ladder • Backbone only (Go interactions) • Backbone + beta-carbon “side-chains” • United groups (quasi rigid) • United atom • All atom • Each level omits specific internal coordinates • Other levels may be needed

  48. all coarse all detailed Key Point: Resolution Difference is Tunable • Can (de)coarsen part of a molecule at a time • e.g., groups of 3 residues • Initial results: Met-enkephalin • Less overall CPU time for de-coarsening one residue at a time vs. whole molecule (for a fixed number of “hops”) • Order of magnitdue more efficient than single-step decoarsening • Poster by Ed Lyman

  49. coarse detailed t Resolution Exchange Variants • Switching • Coarse sim. as MC trial • Decorating • Sample coarse and detailed coordinates separately • Re-weight by true Boltzmann factor • “Algorithm Space” has not been fully sampled!

  50. hot cold Annealing based approach: replica exchange variant • Can be re-weighted for canonical sampling at low T [Neal, 2001]

More Related