1 / 39

Protein structure prediction

Protein structure prediction. Alexander Churbanov University of Nebraska at Omaha CSCI 8980 February 14, 2002. Structure of the presentation. Introduction Protein native structure Computational methods of finding a native structure Common methods and principles Specific methods

LionelDale
Download Presentation

Protein structure prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein structure prediction Alexander Churbanov University of Nebraska at Omaha CSCI 8980 February 14, 2002

  2. Structure of the presentation • Introduction • Protein native structure • Computational methods of finding a native structure • Common methods and principles • Specific methods • Homology finding • Threading • Modeling on lattice

  3. Introduction • In Greek mythology, Sisyphus is condemned to an eternity of hard labor; his labor is a frustrating and fruitless, for just as he is about to achieve his goal, his work is undone and he must start again from the beginning • Those who work in protein structure prediction seem to share the same fate

  4. Problem of protein structure prediction • Proteins are key molecules in all life processes • The function of a protein directly related to its three dimensional structure • Knowing and understanding the structure of proteins will have a tremendous impact on understanding of biological processes, medical discoveries, and biotechnological inventions

  5. Problem of protein structure prediction • For over 30 years, there has been an ardent search for methods to the predict three-dimensional (3D) structure from the sequence • Many methods were found which looked initially very promising - but always the hope has been dashed

  6. Problem of protein structure preduction • Given a sequence of amino acids, predict the unique 3D folding of molecule minimizing its free energy 1 2 3 Lys Computational Methods of prediction Practical use of the 3D structural knowledge Gly Leu Physical methods of prediction Primary structure

  7. Common part  Chain residue General structure of an amino acid • Each amino acid consists of: • Common main chain part, containing the heavy atoms N, C, O, C forming amide plane • Chain residue of size 0 – 10 additional atoms

  8.  Peptide bond • Peptide bond connects carboxyl group of the first amino acid with amino group of the second acid • Peptide bonds are planar and rigid

  9. Sequence of amino acids • Sequence of amino acids, connected by peptide bonds, form protein • There is no flexibility for rotation around peptide bond • There is more flexibility for protein to rotate around N-C-bond (called the -angle) and around C-C-bond (-angle) • These angles are restricted to small regions in natural proteins

  10. Part of Protein (…|Phe|Asp|Ala|…)

  11. Protein folding • Using the freedom of rotations, the protein can fold into a specific and unique three dimensional structure (called conformation), forming a native structure

  12. Computational methods to find a protein structure • The unique 3D arrangement of protein corresponds to lowest free energy conformation • Most computational approaches for solving the protein folding problem look for the lowest free energy conformation • Two principal methods are currently in use for computing the lowest energy conformation: • Molecular dynamics • Monte Carlo

  13. Molecular dynamics • Forces acting on each atom at a particular state of the system are calculated using an empirical force field • Atoms allowed to move with accelerations resulting from forces, changing conformation • Once atom moved significantly, acting forces are recalculated (every 10-15 sec) • Even super computers can simulate only 10-9 sec of folding time, which is insufficient

  14. Monte Carlo method • Used with simplified model of protein (does not consider structure of every amino acid) • Procedure makes random move from current conformation and evaluates resulting energy changes • If new conformation is better, it replaces old one with newly generated, and process repeats • Method is not powerful enough to find an optimal conformation even for simple cases

  15. Knowledge based structure prediction methods • The most successful structure prediction tools are knowledge-based, using a combination of statistical theory and empirical rules • The most successful theoretical approachis homology modelling

  16. Homology modeling • Given a sequence of unknown fold (denote U), if U has significant sequence similarity to a protein of known structure (T) (i.e., if the pairwise sequence identity is >25%), it is possible to construct an approximate 3D model which has a correct fold but inaccurate loop regions

  17. Homology modeling • The basic assumption of homology modelling is that U and the homologous template protein of known structure (T) have nearly identical backbone structure in the aligned regions • A new generation of alignment methods are based on Hidden Markov Models and another on Genetic algorithms

  18. Homology modeling • For sequence identities down to about 30% sequence identity, U and T will still have the same fold, but the number of loops inserted grows and the divergence between U and T becomes considerable • Modelling of loop regions is still a difficult problem; even the best methods only rarely achieve atomic accuracy and are often completely different to the correct structure

  19. Homology modeling • A pessimistic view is that the accuracy of resulting 3D predictions is typically at the level of ribbon plots, i.e. the mutual orientation of elements such as helices and sheets can be identified • The optimistic version is that even down to levels of 30% sequence identity homology modelling occasionally yields correct predictions at atomic resolution

  20. Three difficult problems of homology modeling • Remote homology modelling (<25%) has three obstacles to overcome: • the remote homology between U and T has to be detected • U and T have to be aligned correctly • the homology modelling procedure has to be tailored to the harder problem of extremely low sequence identity

  21. Solution to the first problem • In the early 1990s, there was a great deal of optimism that the first obstacle, the detection of similar folds, would be solved by threading methods • The basic idea is to thread the sequence of U into the backbone 3D structure of T, at each step evaluating the 'fitness of sequence for structure' using environment-based or knowledge-based mean-force-potentials

  22. Protein threading • Many proteins in nature are homologous • They have different primary structure • They form similar conformation to carry out the same functionality in a living matter • There are groups of proteins having the same evolutionary origin

  23. Protein threading • Most protein share the secondary structure motifs: • Helices • Extended strands forming sheets • Specific turns • Random coils

  24. Protein threading • Threading means mapping a given sequence to a given structure • To assign a structure to a sequence one would then need to thread the sequence through all known conformations, evaluating compatibility, and assign the most compatible structure to the sequence • Upon discovery of completely different structure from any known, enter it into database of structures

  25. Protein threading • Structure is presented by the black trace • Sequence (at the top) is threaded through the structure, encoding an alignment (at the bottom) • Zero means structure deletion, values greater that one mean sequence deletion, while one is a fit

  26. Protein threading • The size of the search space to thread sequence of length k into structure of size n could be found as a selection with repetition • Search space is huge and problem appears to be NP-complete [Unger,R., Moult,J. (1993)]

  27. m-1 core regions m loops (non-core) Protein threading • In order to reduce complexity of search task, (m –1) core and mnon-core regions are introduced • Usually -helices and -sheets are core regions, connected by loops • Total number of amino acids in core regions is c

  28. Protein threading • Although suffering from some inherent limitations (such as prediction of the right structure with completely wrong threading), method became a significant tool in protein structure prediction • Any threading procedure must contain two major components: • An alignment algorithm to position a sequence on a structure • Score function to evaluate the “energy” of the sequence in given conformation

  29. Protein threading possible implementations • Protein threading could be implemented using: • Enumeration for small problems, • Dynamic programming to find core regions to “freeze”, • Monte Carlo variants with Gibbs sampling • Branch and bound search • Genetic programming with constraints seems to be a decent alternative in comparison with other methods

  30. Protein structure prediction on lattice • Another way to model protein folding in 3D space is to assume certain simplifications • Modeling on Lattice is a way to fight the complexity of the prediction problem • Though the problem solution on Lattice is still NP-complete, we can expand size of the protein modeled significantly

  31. Protein simplification for lattice model • Monomers (or residues) are represented using a unified size • Bond length is unified • The positions of the monomers are restricted to positions in a lattice • Simplified energy function

  32. HP - model • 20 letter alphabet of amino acids is reduced to a two letter alphabet, namely H and P; • H represents non-polar or hydrophobic amino acid • P represents polar or hydrophilic amino acid

  33. The energy function • The energy function for HP-model is given by the matrix • Energy contribution of a contact between two monomers is –1 if both are H-monomers, and 0 otherwise

  34. Contact energy • Two monomers form a contact in some specific conformation if they are not connected via a bond, but occupy neighboring positions in the conformation • A conformation with minimal energy is just a conformation with the maximal number of contacts between H-monomers

  35. Sample conformation • A sample conformation for the sequence PHPPHHPH in the two-dimentional lattice with energy –2 is

  36. Cubic lattice • Lattice 3D space

  37. Native conformation

  38. Z Z2 Vertical and horizontal contribution to the surface of a conformation in Vertical contribution to the surface Horizontal contribution to the surface

  39. Conclusions • Native 3D structures of proteins are encoded by a linear sequence of amino acid residues • To predict 3D structure from sequence is a task challenging enough to have occupied a generation of researchers • Have they finally succeeded in their goal? The bad news is: no, we still cannot predict structure for any sequence • The good news are: we have come closer, and growing databases facilitate the task.

More Related