Exploring Algorithm Space Variations on the Exchange Theme

Exploring Algorithm SpaceVariations on the Exchange Theme Daniel M. Zuckerman Department of Computational Biology School of Medicine University of Pittsburgh

Goal • More efficient atomistic sampling, consistent with statistical mechanics • Take care with the meaning of “efficiency”

Outline • Protein fluctuations in biology • Replica exchange simulation -- a second look • Resolution exchange simulation • Initial results • How to approach larger systems? • Exchange Variants • Assessing Sampling

Transport Proteins Fluctuate - I

Transport Proteins Fluctuate - II

Motor Proteins Fluctuate

Signalling Proteins Fluctuate

free bound ligand ligand bound free Conformational Change Requires Fluctuation • Either ligand leaves free-like bound structure or ligand binds bound-like free structure (or nearly so)

Biology Take-Home Message • Fluctuations are ubiquitous and essential • They are not a sideshow; they are the show! • Experimental structures are only snapshots -- just the beginning of the story

Key for medicinal chemists especially • Drug design via “docking” is a key practical use of molecular modeling • Typically, drug candidate molecules are fitted into static protein structures • Common lament: need to know protein fluctuations • Necessary for free energy calculations • e.g., binding affinity

1 - 1.5 Å RMSD time Questioning low RMSD in MD • Is 1.3 Å right? What is nature’s avg RMSD???

U x A Physical View of Fluctuations • Rough, high-dimensional energy landscape

x U t p x x Simplest Physical Picture: Bistable system • Most phenomena can be understood from a toy picture

Defining the Problem • We want a good sample of p(x) • “Equilibrium distribution” • “Complete canonical ensemble” • Probability density function • x is a vector in configuration space -- i.e., vector of all coordinates: (x1,y1,z1, x2,y2,z2, …) • In English: We want a set of structures distributed according their probability of occurrence at the specified temperature • Hard because we access p(x) only indirectly • Blind person feeling elephant

It’s NOT optimization/search/minimization! • However, undiscovered sampling algorithms may be similar to search algorithms!

The Problem with the Problem • It’s too hard!! • Present methods, implemented on standard computers, are inadequate by orders of magnitude -- think timescales • Simulations access nsec - msec timescales • Proteins fluctuate on nsec - sec timescales • 3-9 orders of magnitude short! • Today: taking steps toward the solution

U l3 q2 l2 q1 l1 l1 l10 Theoretical/Computational Basics • Boltzmann factor • “Forcefield” (potential energy function) • Configuration vector to real number • Terms not shown: sterics, electrostatics, four-body (e.g., dihedral)

U U x x Exchange Schemes • Original idea: use higher temperature to facilitate barrier crossing [Swendsen, 1986] • Barriers are the real problem • Arrhenius law: • rate ~ barrier’s Boltz. fac. DUfwd

hot 300K t Exchange attempts Exchange Ladder • High-temperature hops percolate down via configuration swaps ( temperature swaps) • Independent sim’s with occasional exchange attempts T

How does replica exchange work? • It’s just Monte Carlo • Physics view of Metropolis • Accept trial move: xold xtry with min[1,exp(-DU/kT)] • DU=U(xtry) - U(xold) • Probability view: • Accept with min[1, prob(try)/prob(old)]

hot time 300K T2 T1 Exchange as simple Monte Carlo • Exchanges are only attempted in pairs • Two independent simulations • Probability for combined system is simple product: p = p1*p2 • Metropolis criterion: min[1, ptry / pold]

Does replica exchange really help? • For a given investment of CPU time, is better fixed-Tsampling achieved? • Compared to equal time direct simulation -- e.g., for a 20-level ladder, a simulation 20 times as long • To my knowledge, no convincing evidence yet • Key:Sampling limited by top level • Worry 1: High T does not help with entropic barriers • Hard-to-find low energy pathways • Worry 2: High T not so helpful for low barriers • Simulations and experiments suggests barriers are low • Even for 600K simulation, only moderate speedup • 2kT 2.7 speedup • 4kT 7.4 speedup • 6kT 20.1 speedup

Summary of Concerns re Replica Exchange • Efficiency limited by top level (highest T) • Highest T may not be fast enough for biomolecules • High T does not affect entropic barriers • Energy barriers may be low • Should work for sufficiently high energy barriers

Can replica exchange be fixed? • Yes • Two improvements today • Plus a sketch of other variants

hot hot U time 300K 300K x Improvement (1): Pseudo-exchanges • Key: Need complete sampling top level (highest T) • Work from top down …if we can “pseudo exchange” Top level can be generated with multiple simulations

fast slow Anatomy of a Pseudo-Exchange • Point 1: Normal exchanges need not be performed at identical intervals • Not required in derivation of Metropolis criterion • Imagine one fast CPU & one slow CPU • Point 2: Imagine top-level CPU is extremely fast • Long intervals  no correlations  equil. dist. • Alternatively, view top level as “perfect” Monte Carlo  equil. dist. • Conclusion: no need to continue top-level sim. from exchanged configuration  can pull randomly each time from top level

hot hotter! time 300K 300K Two Ways to Use Pseudo Exchange • Same ladder • More widely spaced ladder • Lower acceptance OK since trials are cheap (serial) • No need for frequent attempts in parallel since few high T hops • Essentially guaranteed to be more efficient than standard parallel replica exchange.

Top-down test: Di-leucine Peptide • Two amino-acid peptide with two main conformations • 50 atoms (144 degrees of freedom) • Langevin dynamics; GBSA continuum solvent model • ALL SIMULATIONS

b T=500K a T=500K, shuffled Example: Di-leucine via two-level ladder • Di-leucine, a 50-atom peptide: two levels only T=298K using pseudo-exchanges with shuffled 500K trajectory

T=500K T=298K Not really efficient • Boost to 500K only modestly increases hop rate • In 300nsec: 488 hops at 500K vs. 300 at 298K • Barriers are too low • Ordinary trajectories shown (no exchange) • Still should be better than parallel exchange sim.

Improvement (2): Resolution Exchange • Canonical sampling in detailed model Coarse Detailed

Dreams of multi-scale modeling • (At least) since Levitt and Warshel, Nature (1975) • Warshel -- free energy for detailed model based on coarse-grained reference (1999) • Brandt and collaborators -- complex multi-level formulation • Vendrusculo and coworkers -- ad hoc addition of atomic detail onto coarse structures • Resolution exchange is concrete, simple and general

COARSE detailed time Exchange attempts Improvement (2): Resolution Exchange • Qualitative picture

f2 l3 f1 q2 l2 q1 l1 Implementing Resolution Exchange • Need • Formulate as exchange process • Derive acceptance criterion • Coarse model will use subset • Detailed (regular) model x = (l1,l2,l3, …, q1, q2, …, f1,f2, …) • Coarse model is subset, e.g., f = (f1,f2, …) • Arbitrary potential Ucoarse(f) -- i.e., pcrs(f) = exp[- Ucoarse(f) / kT] • Simply exchange common coords.

f2 l3 f1 q2 b l2 q1 b b l1 b Key Point: Subsets are natural for coarse models • Examples • Dihedrals only (fixed angles, lengths) • Backbone coordinates only • Side-chains by beta carbons • Proteins are branched chains

time coarse detailed Res-Ex Metropolis Criterion • The trial exchange • From: (la,qa,fa) and fb [“old”] • To: (la,qa,fb) and fa [“try”] • Metropolis: min[1, ptot(try) / ptot(old)] • Final criterion • min[1,R] CANONICAL SAMPLING FOR ALL COORDS, ALL LEVELS!!!

COARSE detailed time Exchange attempts Downside of Res-ex: more work! • The ladder needs to be engineered • Analogy to replica exchange: limit on difference between models • simple solution (later) • Implicit solvent: still hard and important problem

You can recycle! • Top-down approach (pseudo-exchanges) permits old trajectories to be exchanged into new • New temperature • New forcefield • Same or different numbers of coordinates • Minimal CPU cost, if original trajectory already crossed barriers

Initial Results • Still early stages • Verifying the algorithm • Efficiency in a 50-atom di-peptide • [A penta-peptide] • Reduced models of proteins are reasonable

= central dihedral Line is from direct sim. Algorithm Check: Butane • Butane is C4H10

Real Molecular Test: Di-leucine Peptide • Two amino-acid peptide with two main conformations • Exchange all-atom to united-atom (GBSA “solvent”) • eliminate non-polar H • 50 atoms to 24 “united atoms” united atom

Initial Results: Res-ex really works • CPU Savings: Factor of 15 (including united-atom cost)

From long brute-force sim. Leucine free energy difference via Res-Ex • DGab measures if correct time spent in each state • Increased precision indicates speedup (first report??) • Cost of united-atom simulation included in graph

Comments • Results obtained from a two-level ladder • Faster sampling should be possible with more levels • Requires forcefield engineering • Can use higher temperature also • AND/OR softer parameters

Spin Systems Too • Absolute spins • … or block spins as coarse variables () • Relative spins as detailed coordinates (+–)

How do we progress from here? • Need an exchangeable ladder • But we have design criteria • Top level needs to explore important fluctuations

A Possible Ladder • Backbone only (Go interactions) • Backbone + beta-carbon “side-chains” • United groups (quasi rigid) • United atom • All atom • Each level omits specific internal coordinates • Other levels may be needed

all coarse all detailed Key Point: Resolution Difference is Tunable • Can (de)coarsen part of a molecule at a time • e.g., groups of 3 residues • Initial results: Met-enkephalin • Less overall CPU time for de-coarsening one residue at a time vs. whole molecule (for a fixed number of “hops”) • Order of magnitdue more efficient than single-step decoarsening • Poster by Ed Lyman

coarse detailed t Resolution Exchange Variants • Switching • Coarse sim. as MC trial • Decorating • Sample coarse and detailed coordinates separately • Re-weight by true Boltzmann factor • “Algorithm Space” has not been fully sampled!

hot cold Annealing based approach: replica exchange variant • Can be re-weighted for canonical sampling at low T [Neal, 2001]

Exploring Algorithm Space Variations on the Exchange Theme