1 / 30

Chain Growing Using Statistical Energy Functions

Chain Growing Using Statistical Energy Functions. David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha Andrew Leaver-Fey Shuquan Zong. Overview. Lattice Chain Growth Algorithm Statistical Energy Functions 2-body Miyazawa-Jernigan Potential 4-body Potential

jon
Download Presentation

Chain Growing Using Statistical Energy Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha Andrew Leaver-Fey Shuquan Zong

  2. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  3. Chain Growing - Introduction • Lattice Chain Growing Goals: • Test measures of proteins • Build protein chains that maximize a given measure • If these chains appear native like, confirms that this is valid measure • Predict protein structures from just sequence information, ab initio. • Develop an algorithm to build 3D folded protein decoys from the sequence that are similar to the native structure • Evaluate these decoys and determine which are native-like. In short, be able to pick the most native-like structure from the large set of decoys we will generate.

  4. Lattice Chain Growth Algo. • Cubic lattice (311) w/ 24 possible moves {(3,1,1),(3,1,-1),…,(-3,1,1)} • Generate chain configuration by sequential addition of links until full length of chain is reached. • New links can not be placed in the zone of exclusion of of other links and must satisfy angle constraints.

  5. Lattice Chain Growth Algo.: Adding a new link • Generate a set of possible open lattice nodes. • For each, calculate a temperature-dependent transition probability. • Choose one of these open lattice nodes with a Monte Carlo step. • Variations such as look 2 steps ahead or building from middle

  6. Temperature-Dependent Transition Probability • Probability at step i of picking configuration x’ from x1 … xC : • T = temperature • kB = Boltzman Constant • E = Energy (Lower is better.)

  7. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  8. Statistical Energy Functions • Statistical energy functions assume that “contact” energies between amino acid residues in native proteins are related to their observed frequency in a representative structural database. • If a potential configuration (decoy) has a certain set of nearby residues that is common in nature, give this a good score. • Score for entire protein is sum of all contact energies. • We use three statistical energy functions: • 2-body Miyazawa-Jernigan • 4-body Potential • Local Shape Potential

  9. Statistical Energy FunctionsOverview • Global vs. Local • Global: Measures well the entire protein (or partial fragment) • Local: Measures just a small sequence of consecutive residues • 2-body Miyazawa-Jernigan • Easy to calculate • Can be global or local • 4-body Potential • Expensive to calculate • Works better as a global measure • Good for determining native-like folded structures • Local Shape Potential • Easy to calculate • Defined as a local measure • Global measure ?

  10. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  11. Two-body Statistical Energy Function • For two-body potentials: • Actual ij values are taken from the Miyazawa-Jernigan matrix as reevaluated in 1996 Miyazawa S, Jernigan RL. Residue residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 1996;256: 623 644.

  12. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  13. Each tetrahedron corresponds to a cluster of four residues Convex hull formed by the tetrahedral edges Four-Body Statistical Energy Function • Calculates the energy based on a sets of 4 nearby residues (quad). • Quads calculated from the Delaunay Tessellation. • The 4 vertices of each tetrahedra define a quad. • Each quad is given a statistical score.

  14. Four-Body Statistical Energy Function - Overview • Four-body potential is written . • Training set of 1166 proteins were tessellated • Frequency of each quad type is counted • Each quad is typed in two ways • by the combination of the four residue types {i,j,k,l} • by the number of consecutively appearing residues () 25.5% 35.6% 11.4% 22.1% 5.4%

  15. Four-Body Statistical Energy Function - Classifying quadruplets • Denote each quad by {i,j,k,l} • i,j,k and l can be any of the 20 amino acids (L20) • e.g. AALV, TLKM, TTLK, YYYY etc. • 8855 possible combinations • Or 20 amino acids can be grouped into just 6 types (L6) • Groups defined by chemical properties of amino acids • 126 possible combinations c={cysteine} f={phenylaline, tyrosine, tryptophan} h={histiine, arginine, lysine} n={asparagine, aspartic acid, glutamine, glutamic acid} s={serine, threonine, proline, alanine, glycine} v={methionine, isoleucine, leucine, valine}

  16. Four-Body Statistical Energy Function - Classifying quadruplets • L20 Case: • 5 -types x 8855 combination ==> 44,275 quad types • Not all quad types observed in training set • Potential of unfound types set to some fraction of the lowest score for a represented quad type. • L6 Case: • 5 -types x 126 combination ==> 630 quad types • All but a few quad types observed in training set

  17. Four-Body Statistical Energy Function - Formulation • Formulation is an extension of the previous 2-body formula: where,

  18. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  19. Local Shape Statistical Energy Function • Motivation: • Fragment libraries model protein structures accurately. • Use the frequency of common fragments to construct a statistical function that supplements the 2 and 4-body energy functions to grow better decoys • Good fragment libraries exist, but for the lattice-chain building we need fragments that fit in the 311 lattice • Main Idea: • For each possible consecutive sequence of four residues, i, j, k, and l, calculate in which shape these residues most often occur. Shape – A Shape – B • If Shape – A is found more often in nature, try to build chain accordingly

  20. Shape 1 Sample protein Shape 2 … Shape 155 Local Shape Statistical Energy Function • Create set of canonical lattice shapes of length 4 (and 5) • Calculate ways to embed chain of length 4 (or 5) in 311 lattice. • 155 canonical shapes for length 4, (2789 for length 5) • For L6, there are 64=1,296 sequences 155 x 1,296 = 200,880 combinations • Parse representative set of 971 proteins into segments. • For each 4 length segment, calculate RMSD against each canonical shape

  21. Local Shape Statistical Energy Function • Turning RMSD values into frequencies • If only the canonical shape with best RMSD are counted, not all 200,880 shapes found in training set. • If two canonical shapes have low RMSD, give each some credit • If each For each RMSDi,j,k,l , i,j,k,l = residue type,  = shape • Normalize the 155 RMSD values

  22. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  23. Results - Building Decoys • Decoys produced by the Chain Growing still not good enough. • Relatively good correlation between RMSD and 4-Body Energy. • 2mhu Built with MJ Potential Local Shape Pot. Four-body Energy per residue Four-body Energy per residue Native state

  24. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  25. Identifying good Decoys • 20L or 6L Non-bonded • Sum only the contribution of -type 0 tetrahedra.

  26. Discriminating Native & Non-Native • Non-Bounded L20 scoring function applied to a set of folded and unfolded decoys.

  27. Overview • Lattice Chain Growth Algorithm • Statistical Energy Functions • 2-body Miyazawa-Jernigan Potential • 4-body Potential • Local Shape Potential • Results • Chains • Identifying Good Decoys • Current Work • New Scoring Functions • Incremental Tetrahedralization • Future work

  28. Adjustments to Scoring Functions • 20L or 6L Non-bonded • Sum only the contribution of -type 0 tetrahedra. • 20L or 6L 5T • Sum contribution of all tetrahedra. • 20L Ratio All • As above, but Define:

  29. Incremental Tetrahedralization • Maintain constant tetrahedralization and only add and remove single vertices. • When evaluating a new candidate, update total energy by tagging new quadruplets as well as any that have been removed. • Add the effect of the new, and subtract effect of those removed. Add candidate and evaluate. Remove candidate and reset state. Add next candidate and reevaluate.

  30. References Generating folded protein structures with a lattice chain-growth algorithm. H.H. Gan, A. Tropsha and T. Schlick, J. Chem. Phys. 113, 5511-5524 (2000). Lattice protein folding with two and four-body statistical potentials. H.H. Gan, A. Tropsha and T. Schlick, Proteins: Structure, Function, and Genetics 43, 161-174 (2001). Miyazawa S, Jernigan RL. Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 1996;256: 623–644. Tropsha A, Sigh RK, Vaisman LI. Delaunay tessellation of proteins: Four body nearest neighbor propensities of amino acid residues, J. Comput. Biol. 1996:3:2, 213-222 (1996). R. Kolodny, P. Koehl, L. Guibas and M. Levitt. Small libraries of protein fragments model native protein structures accurately, J. Mol. Biol., 323, 297-307 (2002).

More Related