1 / 57

Robotics Algorithms for the Study of Protein Structure and Motion

Robotics Algorithms for the Study of Protein Structure and Motion . Jean-Claude Latombe Computer Science Department Stanford University. Based on Itay Lotan’s PhD. Many pathways. Unfolded (denatured) state. Folded (native) state. Folded State. Loops connect  helices and  strands.

mirari
Download Presentation

Robotics Algorithms for the Study of Protein Structure and Motion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science DepartmentStanford University Based on Itay Lotan’s PhD

  2. Many pathways Unfolded (denatured) state Folded (native) state

  3. Folded State Loops connect  helices and  strands

  4. peptide bonds Protein Sequence Structure amino-acid (residue)

  5. f-y Kinematic Linkage Model  Conformational space

  6. Molecule  Robot

  7. Why Studying Proteins? • They perform many vital functions, e.g.: • catalysis of reactions • storage of energy • transmission of signals • building blocks of muscles • They are linked to key biological problems that raise major computational challenges mostly due to their large sizes (100s to several 1000s of atoms), many degrees of kinematic freedom, and their huge number (millions)

  8. Two problems • Structure determination from electron density maps • Inverse kinematics techniques [Itay Lotan, Henry van den Bedem, Ashley Deacon (Joint Center for Structural Genomics)] • Energy maintenance during Monte Carlo simulation • Distance computation techniques [Itay Lotan, Fabian Schwarzer, and Danny Halperin (Tel Aviv University)]

  9. Structure Determination: X-Ray Crystallography

  10. Software • Software systems: RESOLVE, TEXTAL, ARP/wARP, MAID • 1.0Å < d < 2.3Å ~ 90% completeness • 2.3Å ≤ d < 3.0Å ~ 67% completeness (varies widely)1 1.0Å 3.0Å JCSG: 43% of data sets  2.3Å •  Manually completing a model: • Labor intensive, time consuming • Existing tools are highly interactive  Model completion is high-throughput bottleneck 1Badger (2003) Acta Cryst. D59

  11. (Inverse Kinematics) The Completion Problem • Input: • Electron-density map • Partial structure • Two anchor residues • Amino-acid sequence of missing fragment (typically 4 – 15 residues long) • Output: • Ranked conformations Q of fragment that • Respect the closure constraint • Maximize target function T(Q) measuring fit with electron-density map • No atomic clashes Partial structure(folded)

  12. Two-Stage IK Method • Candidate generations Closed fragments • Candidate refinement Optimize fit with EDM

  13. Stage 1: Candidate Generation • Generate a random conformation of fragment (only one end attached to anchor) • Close fragment (i.e., bring other end to second anchor) using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

  14. moving end fixed end Closure Distance Closure Distance: A.A. Canutescu and R.L. Dunbrack Jr.Cyclic coordinate descent: A robotics algorithm for protein loop closure. Prot. Sci. 12:963–972, 2003. Compute + bias toward avoiding steric clashes

  15. Exact Inverse Kinematics Repeat for each conformation of a closed fragment: • Pick 3 amino-acids at random (3 pairs of f-y angles) • Apply exact IK solver to generate all IK solutions [Coutsias et al, 2004]

  16. TM0813 GLU-83 GLY-96

  17. dq3 dq2 (q1,q2,q3) dq1 Stage 2: Candidate Refinement • Target function T (Q)measuring quality of the fit with the EDM • Minimize T while retaining closure • Closed conformations lie on a self-motion manifold of lower dimension Null space 1-D manifold

  18. Closure and Null Space • dX = J dQ, where J is the 6n Jacobian matrix (n > 6) • Null space {dQ | J dQ = 0} has dim = n – 6 • N: orthonormal basis of null space • dQ = NNT T(Q) X

  19. 0 NT (n-6) basis N of null space s1 s2 Gram-Schmidt orthogonalization s6 Computation of N SVD of J S66 dX U66 VT6n dQ =

  20. Refinement Procedure Repeat until minimum of T is reached: • Compute J and N at current Q • Compute T at current Q(analytical expression of T + linear-time recursive computation[Abe et al., Comput. Chem., 1984]) • Move by small increment along dQ = NNT T (+ Monte Carlo / simulated annealing protocol to deal with local minima)

  21. TM0813 GLU-83 GLY-96

  22. Tests #1: Artificial Gaps • TM1621 (234 residues) and TM0423 (376 residues), SCOP classification a/b • Complete structures (gold standard) resolved with EDM at 1.6Å resolution • Compute EDM at 2, 2.5, and 2.8Å resolution • Remove fragments and rebuild

  23. TM1621 103 Fragments from TM1621 at 2.5Å Short Fragments: 100% < 1.0Å aaRMSD Long Fragments: 12: 96% < 1.0Å aaRMSD 15: 88% < 1.0Å aaRMSD Produced by H. van den Bedem

  24. Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD

  25. Tests #2: True Gaps • Structure computed by RESOLVE • Gaps completed independently (gold standard) • Example: TM1742 (271 residues) • 2.4Å resolution; 5 gaps left by RESOLVE Produced by H. van den Bedem

  26. TM1621 • Green: manually completed conformation • Cyan: conformation computed by stage 1 • Magenta: conformation computed by stage 2 • The aaRMSD improved by 2.4Å to 0.31Å

  27. A B Current/Future Work • Software actively being used at the JCSG • What about multi-modal loops?

  28. A323 Hist A316 Ser • TM0755: data at 1.8Å • 8-residue fragment crystallized in 2 conformations • Overlapping density: Difficult to interpret manually Algorithm successfully identified and built both conformations

  29. A B Current/Future Work • Software actively being used at the JCSG • What about multi-modal loops? • Fuzziness in EDM can then be exploited • Use EDM to infer probability measure over the conformation space of the loop

  30. Amylosucrase J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. J. Comp. Chemistry, 25:956-967, 2004

  31. Energy maintenance during Monte Carlo simulation joint work with Itay Lotan, Fabian Schwarzer, and Dan Halperin11 Computer Science Department, Tel Aviv University

  32. Monte Carlo Simulation (MCS) • Random walk through conformation space • At each attempted step: • Perturb current conformation at random • Accept step with probability: • The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~

  33. Monte Carlo Simulation (MCS) • Used to: • sample meaningful distributions of conformations • generate energetically plausible motion pathways • A simulation run may consist of millions of steps  energy must be evaluated a large number of times Problem: How to maintain energy efficiently?

  34. Energy Function • E = S bonded terms + S non-bonded terms+S solvation terms • Bonded terms- O(n) • Non-bonded terms- E.g.,Van der Waals and electrostatic- Depend on distances between pairs of atoms-O(n2)  Expensive to compute • Solvation terms-Mayrequire computing molecular surface

  35. Non-Bonded Terms • Energy terms go to 0 when distance increases  Cutoff distance (6 - 12Å) • vdW forces prevent atoms from bunching up Only O(n) interacting pairs[Halperin&Overmars 98] Problem: How to find interacting pairswithout enumerating all atom pairs?

  36. dcutoff Grid Method • Subdivide 3-space into cubic cells • Compute cell that contains each atom center • Represent grid as hashtable

  37. dcutoff Grid Method • Θ(n) time to build grid • O(1) time to find interactive pairs for each atom • Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98] • Asymptotically optimal in worst-case

  38. 0 Number k of DOF changes 20 5 30 10 Can we do better on average? • Few DOFs are changed at each MC step simulationof 100,000 attempted steps

  39. Can we do better on average? • Few DOFs are changed at each MC step • Proteins are long chain kinematics Long sub-chains stay rigid at each step  Many interacting pairs of atoms are unchanged  Many partial energy sums remain constant Problem: How to find new interacting pairs and retrieve unchanged partial sums?

  40. Two New Data Structures • ChainTree Fast detection of interacting atom pairs • EnergyTree Retrieval of unchanged partial energy sums

  41. ChainTree(Twofold Hierarchy: BVs + Transforms) links

  42. TNO TJK TAB ChainTree(Twofold Hierarchy: BVs + Transforms) joints

  43. Updating the ChainTree Update path to root: • Recompute transforms that “shortcut” the DOF change • Recompute BVs that contain the DOF change • O(k log2(2n/k)) work for k changes

  44. Finding Interacting Pairs 

  45. Finding Interacting Pairs

  46. Finding Interacting Pairs • Do not search inside rigid sub-chains (unmarked nodes)

  47. Finding Interacting Pairs • Do not search inside rigid sub-chains (unmarked nodes) • Do not test two nodes with no marked node between them  New interacting pairs

  48. EnergyTree E(N,N) E(K.L) E(M,M) E(L,L) E(J,L)

  49. EnergyTree E(N,N) E(K.L) E(M,M) E(L,L) E(J,L)

  50. Complexity • n: total number of DOFs • k: number of DOF changes at each MCS step • k << n • Complexity of: • updating ChainTree: O(k log2(2n/k)) • finding interacting pairs: O(n4/3)but performs much better in practice!!!

More Related