1 / 53

Structure and Motion

Structure and Motion. Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on. November 14, 2002. Stanford’s Participants. PI’s: L. Guibas, J.C. Latombe, M. Levitt Research Associate: P. Koehl Postdocs: F. Schwarzer, A. Zomorodian

Download Presentation

Structure and Motion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure and Motion Jean-Claude LatombeComputer Science Department Stanford University NSF-ITR Meeting on November 14, 2002

  2. Stanford’s Participants • PI’s: L. Guibas, J.C. Latombe, M. Levitt • Research Associate: P. Koehl • Postdocs: F. Schwarzer, A. Zomorodian • Graduate students: S. Apaydin (EE), S. Ieong (CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen (Sc. Comp.), D. Russel (CS), R. Singh (CS), C. Varma (CS) • Undergraduate students: J. Greenberg (CS),E. Berger (CS) • Collaborating faculty: • A. Brunger (Molecular & Cellular Physiology) • D. Brutlag (Biochemistry) • D. Donoho (Statistics) • J. Milgram (Math) • V. Pande (Chemistry)

  3. Problems Addressed Biological functions derive from the structures (shapes) achieved by molecules through motions  Determination, classification, and prediction of 3D protein structures  Modeling of molecular energy and simulation of folding and binding motion

  4. What’s New for Computer Science? • Massive amount of experimental data • Importance of similarities • Multiple representations of structure • Continuous energy functions • Many objects forming deformable chains • Many degrees of freedom • Ensemble properties of pathways

  5. Massive amount of experimental data  Abstract/simplify data sets into compact data structures E.g.: Electron density map  Medial axis

  6. clustered data smalllibrary data set Importance of similarities Segmentation/matching/scoring techniques E.g.: Libraries of protein fragments[Kolodny, Koehl, Guibas, Levitt, JMB (2002)]

  7. Complexity 2.26 (50 fragments of length 7) 2.7805AcRMS Complexity 10 (100 fragments of length 5) 0.9146A cRMS 1tim Approximations real protein

  8. Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial] Problem: • Determine if two structures share common motifs: • 2 (labelled) structures in R3 A={a1,a2,…,an}, B={b1,b2,…,bm} • Find subsequences sa and sb s.t the substructures {asa(1),asa(2),…, asa(l)} {bsb(1),bsb(2),…, bsb(l)} are similar • Twofold problem: alignment and correspondence • Score  Approximation  Complexity

  9. [R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Iterative Closest Point (Besl-McKay) for alignment:  Score: RMSD distance

  10. [R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin Trypsinactivesite

  11. [R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin active site against 42Trypsin like proteins

  12. Multiple representations of structure ProShape software[Koehl, Levitt (Stanford),Edelsbrunner (Duke)]

  13. Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian] • Decoys generated using “physical” potentials • Select best decoys using distance information

  14. Continuous energy functions • Many objects in deformable chains Many pairs of objects, but relatively few are close enough to interact  Data structures that capture proximity, but undergo small or rare changes During motion simulation - detect steric clashes (self-collisions) - find pairs of atoms closer than cutoff

  15. Other application domains: • Modular reconfigurable robots • Reconstructive surgery

  16. Fixed Bounding-Volume hierarchies don’t work sec17

  17. Instead, exploit what doesn’t change: chain topologyAdaptive BV hierarchies[Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02) sec17

  18. Wrapped bounding sphere hierarchies[Guibas, Nguyen, Russel, Zhang] (SoCG 2002) • WBSH undergoes small number of changes • Self-collision: • O(n logn ) in R2 O(n2-2/d) in Rd, d 3

  19. ChainTrees[Lotan, Schwarzer, Halperin, Latombe] (SoCG’02) Assumption: Few degrees of freedom change at each motion step (e.g., Monte Carlo simulation) • Find all pairs of atoms closer than a given cutoff • Find which energy terms can be reused

  20. ChainTrees[Lotan, Schwarzer, Halperin, Latombe] (SoCG’02) Updating: Finding interacting pairs: (in practice, sublinear)

  21. (755) (755) (68) (68) (144) (144) (374) (374) ChainTreesApplication to MC simulation (comparison to grid method) m=1 m = 5

  22. Run new series of experiments with more complex energy field: EEF1 [Lazaridis & Karplus] (with Pande) • Use library of fragments (with Koehl) Future work:ChainTrees Open problem: How to find good moves to make when the conformation is compact and random moves are rejected with high probability?

  23. Future Work:Spanner for deformable chain[Agarwal, Gao, Duke; Nguyen, Zhang, Stanford] 3HVT Capture proximity information with a sparse spanner

  24. Many degrees of freedom Tools to explore large dimensional conformation space: - Sampling strategies - Nearest neighbors

  25. cab bbc a b c d Sampling structures by combining fragments[Kolodny, Levitt] Library of protein fragments  Discrete set of candidate structures

  26. a3 a6 a0 am a5 a1 a2 a4 Nearest neighbors in high-dimensional space[Lotan and Schwarzer] Find k nearest neighbors of a given protein conformation in a set of n conformations (cRMS, dRMS) Idea: Cut backbone into m equal subsequences

  27. Nearest neighbors in high-dimensional space[Lotan and Schwarzer] 100,000 decoys of 1CTF (Park-Levitt set) Computation of 100 NN of each conformation ~80% of computed NNs are true NNskd-tree software from ANN library (U. Maryland)

  28. Ensemble properties of pathways  Stochastic nature of molecular motion requires characterizing average properties of many pathways

  29. HIV integrase [Du et al. ‘98] 1- pfold pfold Example #1: Probability of Folding pfold “We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). Folded set Unfolded set

  30. Example #2: Ligand-Protein Interaction[Sept, Elcock and McCammon `99] 10K to 30K independent simulations

  31. vi Pij vj Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02) Idea: Capture the stochastic nature of molecular motion by a network of randomly selected conformations and by assigning probabilities to edges

  32. U: Unfolded set F: Folded set =1 =1 Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02) • One linear equation per node • Solution gives pfold for all nodes • No explicit simulation run • All pathways are taken into account • Sparse linear system l k j Pik Pil Pij m Pim i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

  33. Probabilistic Roadmap Correlation with MC Approach • 1ROP (repressor of primer) • 2 a helices • 6 DOF

  34. Probabilistic Roadmap Computation Times (1ROP) Monte Carlo: Over 106energy computations Over 11 days of computer time 49 conformations Roadmap: ~15,000energy computations 1 - 1.5 hours of computer time 5000 conformations ~4 orders of magnitude speedup!

  35. Future work: Probabilistic Roadmap • Non-uniform sampling strategies • Encoding molecular dynamics into probabilistic roadmaps (with V. Pande) • Quantitative experiments with ligand-protein binding (with V. Pande)

  36. Bio-X – Clark Center

  37. The following slides relate to non-research issues. I do not plan to present them. Jack and Leo may want to use the contents of some of them for their own presentations.

  38. Education • Tutorial on Delaunay, Alpha-Shape and Pockets (Koehl) • A biocomputing Notebook (Koehl) • Biocomputation lectures in pre-existing classes: • CS326 – motion planning: molecular motion, probabilistic roadmaps, self-collision detection (Latombe) • CS468 – intro to computational topology: finding pockets and tunnels in molecules, compute surface areas and volumes and their derivative (Zomorodian) • New class on Algorithmic Biology (Batzoglu, Guibas, Latombe) • Graduate Curriculum Committee, Bio-Engineering Dept., Stanford (Latombe)

  39. Trained Students (1/2) • PhD students • Serkan Apaydin, EE • An Nguyen, Scientific Computing • Carlos Guestrin, CS (Daphne Koller’s group) • Itay Lotan, CS • Rachel Kolodny, CS • Daniel Russel, CS • Samuel Ieong, CS Most graduate students have a principal advisor in CS and a secondaryone in a bio-related department (Levitt, Brutlag, Pande)

  40. Trained Students (2/2) • Graduated Master students • Rohit Singh, finding motifs in proteins, best Stanford CS master’s thesis, June ’02 [current position: bioinformatics company in San Diego] • Chris Varma, study of ligand-protein interaction with probabilistic roadmaps, June ’02 [current position: PhD student, Harvard/MIT Biomedical program] • Current Master student • Ben Wong, modeling T cell activity • Undergraduate • Eric Berger, CS, Stanford, summer internship • Julie Greeberg, CS, Harvard, summer internship

  41. Visitors • Prof. Alberto MunozMath Dept., University of Yucatan, Mexico3 months, Summer’02Haptic interaction and probabilistic roadmaps • Prof. Ileana StreinuSmith College6 months, from Sept.’02Protein folding

  42. Interactions Within Stanford -Guibas and Levitt, with J. Milgram (Math): topology of configuration spaces of chains- Guibas, with V. Pande (Chemistry) and D. Donoho (Statistics) non-linear multi-resolution analysis of molecular motions- Latombe and Apaydin, with D. Brutlag (Biochemistry) and V. Pande: probabilistic roadmaps- Latombe and Lotan with V. Pande: efficient MC simulation

  43. Interactions Outside Stanford - Collision Detection for Deforming Necklaces, P. Agarwal, L. Guibas, A. Nguyen, D. Russel, and L. Zhang. Invited to special issue of Comp. Geom., Theory and Applications, following presentation at SoCG'02.- Kinetic Medians and kd-Trees, P. Agarwal, J. Gao, and L. Guibas. Proc. 10th European Symp. Algorithms, LNCS 2461, Springer-Verlag, 5-16, 2002.- Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion, M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and J.C. Latombe. Proc. RECOMB'02, Washington D.C., pp. 12-21, 2002. - Efficient Maintenance and Self-Collision testing for Kinematic Chains, I. Lotan, F. Schwarzer, D. Halperin, and J.C. Latombe, SoCG’02, pp. 43-42. June 2002.- Stochastic Conformational Roadmaps for Computing Ensemble Properties of Molecular Motion, M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and J.C. Latombe. Workshop on Algorithmic Foundations of Robotics (WAFR), Nice, Dec. 2002.

  44. Attendance to Conferences - BCATS ‘01 and ‘02 [Bio-Computation At Stanford]- RECOMB ’02 [Int. Conf. on Research in Computational Biology]- ISMB ‘02 [Int. Conf. on Intelligent Syst. for Molecular Biology]- ECCB 2002 [European Conf. on Computational Biology]- Biophysical Society Symp. on Molecular Simulations in Structural Biology, 2002- SoCG 2002 [ACM Symp. on Computational Heometry]

  45. Outreach - Latombe and Levitt serve as members of the Scientific Leadership Council of Stanford’s Bio-X program- Presentations: Stanford’s Bio-X Symposium (3/02), Stanford’s Computer Forum (3/02), Berkeley’s Broad Area Seminar (4/02)- Conference committees:Guibas, program committee, WAFR’02 and SoCG’03 Latombe, program committee, 1st IEEE Bioinformatics Conf. ‘03 Apaydin, organization committee of BCATS’02

  46. The following slides are extra slides that I removed from my presentation for lack of time

  47. General Goals • Larger proteins considered computational efficiency • Diversity of molecules and interactions computational abstractions • Extension of in-silico experiments computational correctness Enable biological studies that were not possible before, more systematically

  48. Approach • Select hard problems • Close interaction between computer scientists (Guibas, Koehl, Latombe) and biologists (Koehl, Levitt, Brutlag, Pande, Brunger) • Most graduate students are CS students with secondary advisor in biology • Perform extensive tests

  49. Electron density map  Medial axis[Guibas, Brunger, Russel] • Medial axis of iso-surfaces to estimate backbone • Cleaning and simplification of axis to filter noise out • Persistence of features across multiple iso-surfaces sec17

  50. Continuous energy function Essential for protein structure prediction and molecular motion simulation: - Statistical potentials based on alpha complex - Maintenance of energy values during simulation

More Related