420 likes | 433 Views
Develop efficient computational representations and algorithms to study molecular pathways for protein folding and ligand-protein binding.
E N D
Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag1Carlos Guestrin, David Hsu2Jean-Claude Latombe, Chris Varma Computer Science Department Stanford University 1 Department of Biochemistry, Stanford University 2 Computer Science Department, University of North Carolina
Goal of our Research Develop efficient computational representations and algorithms to study molecular pathways for protein folding and ligand-protein binding Protein folding RECOMB ’02 Ligand-protein binding ECCB ‘02
Acknowledgements • People: Leo Guibas Michael Levitt, Structural Biology Itay Lotan Vijay Pande, Chemistry Fabian Schwarzer Amit Singh Rohit Singh • Funding: NSF-ITR ACI-0086013 Stanford’s Bio-X and Graduate Fellowship programs
Configuration Space Approximate the free space by random sampling Probabilistic Roadmaps
free space [Kavraki, Svetska, Latombe,Overmars, 95] Probabilistic Roadmap
Probabilistic Completeness The probability that a roadmap fails to correctly capture the connectivity of the free space goes to 0 exponentially in the number of milestones (~ running time). Random sampling is convenient incremental scheme for approximating the free space
Biology Robotics • Energy field, instead of joint control • Continuous energy field, instead of binary free and in-collision spaces • Multiple pathways, instead of single collision-free path • Potentially many more degrees of freedom • Relation to real world is more complex
Initial Work[Singh, Latombe, Brutlag, 99] • Study of ligand-protein binding • Probabilistic roadmaps with edges weighted by energetic plausibility • Search of most plausible paths
energy Catalytic Site Initial Work[Singh, Latombe, Brutlag, 99] • Study of ligand-protein binding • Probabilistic roadmaps with edges weighted by energetic plausibility • Search of most plausible paths • Study of energy profiles along such paths
Initial Work[Singh, Latombe, Brutlag, 99] • Study of ligand-protein binding • Probabilistic roadmaps with edges weighted by energetic plausibility • Search of most plausible paths • Study of energy profiles along such paths • Extensions to protein folding[Song and Amato, 01] [Apaydin et al., 01]
vi Pij vj New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges
Why is this a good idea? • We can approximate Monte Carlo simulation as closely as we wish • Unlike with MC simulation, we avoid the local-minima problem • We can consider all pathways in the roadmap at once to compute ensemble properties
Pii vi Pij Edge probabilities Follow Metropolis criteria: Self-transition probability: vj
Stochastic Roadmap Simulation S Pij Stochastic simulation on roadmap and Monte Carlo simulation converge to same Boltzmann distribution
Problems with Monte Carlo Simulation • Much time is wasted in local minima • Each run generates a single pathway
Pij Solution Treat roadmap as a Markov chain and use the First-Step Analysis tool
1- pfold pfold Example #1: Probability of Folding pfold HIV integrase [Du et al. ‘98] “We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). Folded set Unfolded set
U: Unfolded set F: Folded set =1 =1 First-Step Analysis • One linear equation per node • Solution gives pfold for all nodes • No explicit simulation run • All pathways are taken into account • Sparse linear system l k j Pik Pil Pij m Pim i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
In Contrast … Computing pfold with MC simulation requires: • Performing many MC simulation runs • Counting the number of times F is attained first foreveryconformation of interest:
Computational Tests • 1HDD (Engrailed homeodomain) • 3 a helices • 12 DOF • 1ROP (repressor of primer) • 2 a helices • 6 DOF H-P energy model with steric clash exclusion [Sun et al., 95]
Computation Times (1ROP) Monte Carlo: Over 106energy computations Over 11 days of computer time 49 conformations Roadmap: ~15,000energy computations 1 - 1.5 hours of computer time 5000 conformations ~4 orders of magnitude speedup!
Example #2: Ligand-Protein Interaction Computation of escape timefrom funnels of attraction around potential binding sites (funnel = ball of 10A rmsd)
Computing Escape Time with Roadmap l k Pil Pik m Pij j Pim i Pii Funnel of Attraction ti = 1 + Piiti + Pijtj+ Piktk + Piltl + Pimtm (escape time is measured as number of stepsof stochastic simulation) = 0
Similar Computation Through Simulation [Sept, Elcock and McCammon `99] 10K to 30K independent simulations
Applications • Distinguishing catalytic site: Given several potential binding sites, which one is the catalytic site?
Distinction Based on Energy (kcal/mol)
Distinction Based on Escape Time (# steps)
Applications • Distinguishing catalytic site • Computational mutagenesis GLN-101 Some amino acids aredeleted entirely, replaced by other amino acids, or sidechains altered Loop ARG-106 + CH3 ASP-195 + HIS-193 C O C NADH ASP-166 O O + ARG-169 Chemical environment of LDH-NADH-substrate complex (pyruvate) (catalyzes conversion of pyruvate to lactate in the presence of NADH
GLN-101 Loop ARG-106 CH3 C O C O O Binding of Pyruvate to LDH + ASP-195 + HIS-193 THR-245 ASP-166 NADH + ARG-169
GLN-101 Loop ARG-106 + CH3 ASP-195 + HIS-193 C THR-245 O C NADH ASP-166 O O + ARG-169 Results
Results GLN-101 Loop ALA-106 CH3 ASP-195 ALA-193 C O C NADH ASP-166 O O + ARG-169
GLN-101 Loop ARG-106 + CH3 ASP-195 + HIS-193 C O C NADH ASP-166 O O + ARG-169 Results GLY-245
Conclusion • Probabilistic roadmaps are a promising computational tool for studying ensemble properties of molecular pathways • Current and future work: • Better kinetic/energetic models • Experimentally verifiable tests • Non-uniform sampling strategies • Encoding MD simulation
vs vg Stochastic Roadmap Simulation S Stochastic simulation on a roadmap and MC simulation converge to the same distribution p (Boltzman): For any set S, e>0, d>0, g>0, there exists N such that a roadmap with N milestones has error bounded by: with probability at least 1-g.
x,y,z Ligand-Protein Modeling • DOF = 10 • 3 coordinates to position root atom; • 2 angles to specify first bond; • Angles for all remaining non-terminal atoms; • Bond angles are assumed constant; • Protein assumed rigid [Singh, Latombe and Brutlag `99]
Energy of Interaction Energy = van der Waals interaction (Ev) + electrostatic interaction (Ec) Ev = 0.2[(R0/Rij)12 - 2(R0/Rij)6 ] Ec = 332 QiQj/(Rij) Ec Ev Rij Rij
Solvent Effects Ec = 332 QiQj/(Rij) • Is only valid for an infinite medium of uniform dielectric; • Dielectric discontinuities result in induced surface charges; Solution: Poisson-Boltzman equation [(r) . (r)] - (r)k(r)2sinh([(r)] + 4rf(r)/kT = 0 • Use Delphi [Rocchia et al `01] • Finite Difference solution is based on discretizing the workspace into a uniform grid.