320 likes | 340 Views
This research focuses on determining, classifying, and predicting 3D protein structures and modeling molecular energy through simulation. It also explores techniques for matching and scoring structural motifs, and uses adaptive bounding volume hierarchies and chain trees for efficient computation.
E N D
Structure and Motion Jean-Claude LatombeComputer Science Department Stanford University NSF-ITR Meeting on November 14, 2002
Stanford’s Participants • PI’s: L. Guibas, J.C. Latombe, M. Levitt • Research Associate: P. Koehl • Postdocs: F. Schwarzer, A. Zomorodian • Graduate students: S. Apaydin (EE), S. Ieong (CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen (Sc. Comp.), D. Russel (CS), R. Singh (CS), C. Varma (CS) • Undergraduate students: J. Greenberg (CS),E. Berger (CS) • Collaborating faculty: • A. Brunger (Molecular & Cellular Physiology) • D. Brutlag (Biochemistry) • D. Donoho (Statistics) • J. Milgram (Math) • V. Pande (Chemistry)
Problem Domains Biological functions derive from the structures (shapes) achieved by molecules through motions Determination, classification, and prediction of 3D protein structures Modeling of molecular energy and simulation of folding and binding motion
What’s New/Interesting for Computer Science? • Massive amount of experimental data • Importance of similarities • Multiple representations of structure • Continuous energy functions • Many objects forming deformable chains • Many degrees of freedom • Ensemble properties of pathways
clustered data smalllibrary data set Importance of similarities Segmentation/matching/scoring techniques E.g.: Libraries of protein fragments[Kolodny, Koehl, Guibas, Levitt, JMB (2002)]
Complexity 2.26 (50 fragments of length 7) 2.7805AcRMS Complexity 10 (100 fragments of length 5) 0.9146A cRMS 1tim Approximations real protein
Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial] Problem: Determine if two structures share common motifs: • 2 (labelled) structures in R3 A={a1,a2,…,an}, B={b1,b2,…,bm} • Find subsequences sa and sb s.t the substructures {asa(1),asa(2),…, asa(l)} {bsb(1),bsb(2),…, bsb(l)} are similar • Twofold problem: alignment and correspondence • Score Approximation Complexity
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Iterative Closest Point (Besl-McKay) for alignment: Score: RMSD distance
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin Trypsinactivesite
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin active site against 42Trypsin like proteins
Multiple representations of structure ProShape software[Koehl, Levitt (Stanford),Edelsbrunner (Duke)]
Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian] • Decoys generated using “physical” potentials • Select best decoys using distance information
Continuous energy function • Many objects in deformable chains Many pairs of objects, but relatively few are close enough to interact Data structures that capture proximity, but undergo small or rare changes • During motion simulation • - detect steric clashes (self-collisions) • find pairs of atoms closer than cutoff • find which energy terms can be reused
Other application domains: • Modular reconfigurable robots • Reconstructive surgery
Fixed Bounding-Volume hierarchies don’t work • Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies[Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02) sec17
Wrapped bounding sphere hierarchies[Guibas, Nguyen, Russel, Zhang] (SoCG 2002) • WBSH undergoes small number of changes • Self-collision: • O(n logn ) in R2 O(n2-2/d) in Rd, d 3
ChainTrees[Lotan, Schwarzer, Halperin, Latombe] (SoCG’02) Assumption: Few degrees of freedom change at each motion step (e.g., Monte Carlo simulation) Updating: Finding interacting pairs: (in practice, sublinear)
(755) (755) (68) (68) (144) (144) (374) (374) ChainTreesApplication to MC simulation (comparison to grid method) m = 1 m = 5
Many degrees of freedom Tools to explore large dimensional conformational (structure) spaces: - Structure sampling [Kolodny, Levitt]- Finding nearest neighbors [Lotan, Schwarzer]
cab bbc a b c d Sampling structures by combining fragments[Kolodny, Levitt] Library of protein fragments Discrete set of candidate structures
a3 a6 a0 am a5 a1 a2 a4 Nearest neighbors in high-dimensional space[Lotan, Schwarzer] Find k nearest neighbors of a given protein conformation in a set of n conformations (cRMS, dRMS) Idea: Cut backbone into m equal subsequences
Nearest neighbors in high-dimensional space[Lotan and Schwarzer] 100,000 decoys of 1CTF (Park-Levitt set) Computation of 100 NN of each conformation ~80% of computed NNs are true NNskd-tree software from ANN library (U. Maryland)
Ensemble properties of pathways Stochastic nature of molecular motion requires characterizing average properties of many pathwaysProbabilistic conformational roadmapsApplications to protein folding and ligand-protein binding [Apaydin, Brutlag, Guestrin, Hsu, Latombe]
HIV integrase [Du et al. ‘98] 1- pfold pfold Example: Probability of Folding pfold “We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). Folded set Unfolded set
vi Pij vj Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02) Idea: Capture the stochastic nature of molecular motion by a network of randomly selected conformations and by assigning probabilities to edges
U: Unfolded set F: Folded set =1 =1 Probabilistic Roadmap • One linear equation per node • Solution gives pfold for all nodes • No explicit simulation run • All pathways are taken into account • Sparse linear system l k j Pik Pil Pij m Pim i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
Probabilistic Roadmap Correlation with MC Approach • 1ROP (repressor of primer) • 2 a helices • 6 DOF
Probabilistic Roadmap Computation Times (1ROP) Monte Carlo: Over 106energy computations Over 11 days of computer time 49 conformations Roadmap: ~15,000energy computations 1 - 1.5 hours of computer time 5000 conformations ~4 orders of magnitude speedup!
Interpretation of electron density maps Statistical potential Library of protein fragments Self-collision and energy maintenance Structure alignment ProShape software Tools for high-dimensional spaces Probabilistic roadmaps Biology Structure determination Modeling Shape representation Hierarchies Algorithms Deformation Motion planning Shape organization Software Alpha shapes Summary
Future Work • Perform more substantial experimentsE.g., more realistic potentials in ChainTree and probabilistic roadmaps • Extend tools to solve more relevant problemsE.g., encode Molecular Dynamics into probabilistic roadmaps • Combine resultsE.g., use library of fragments to sample probabilistic roadmaps • Develop new algorithms/data structuresE.g., sparse spanners to capture proximity information
Our Future: The BioX – Clark Center June 2003