1 / 30

Protein Structure Prediction

Protein Structure Prediction. Historical Perspective. Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal perspective on advances and developments in protein folding over the last 40 years. Levinthal Paradox.

luana
Download Presentation

Protein Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Structure Prediction

  2. Historical Perspective • Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 • A personal perspective on advances and developments in protein folding over the last 40 years

  3. Levinthal Paradox • Cyrus Levinthal, Columbia University, 1968 • Observed that there is insufficient time to randomly search the entire conformational space of a protein • Resolution: Proteins have to fold through some directed process • Goal is to understand the dynamics of this process

  4. Old vs. New Views • Old: • Heirarchical view of protein folding • Secondary structures form, then interact to form tertiary structures • General order of events • New: • Statistical ensembles of states • Potential energy landscape • Folding “Funnel” • Not all that different; most important ideas were theorized many years ago

  5. Secondary Structures • Consensus view is that secondary structure formation is the earliest part of the folding process • Numerous studies indicate that local sequence codes for local structures • Helical sequences in a folded protein tend to be helical in isolation • Current SSE prediction algorithms about 70% correct (1993). Failure indicates some tertiary interactions in stabilizing SSEs

  6. However… • Not clear what sequence elements code for overall topology • One factor is the existence of hydrophobic faces on the surface of SSEs • Still challenges in predicting topology of SSEs, even when protein class is known

  7. Atomic level calculations • Molecular calculations have made great impact in our understanding of protein folding • Harold Scheraga, 1968 • Shneior Lifson, 1969 • Martin Karplus’s laboratory, ~1979 • Early calculations had trouble dealing with solvent effects

  8. Secondary Structure • Many of the essential elements of protein energetics can be derived from looking at SSE formation • Early experimental work: Ingwall et all, 1968 • Baldwin et all, 1989, Worked on stabilizing shorter helices • Dyson, Wright, 1991, demonstrated that even short peptides in solution can be partially structured

  9. Results • Yang and Honig, 1995 • Alpha-helices stabilized by hydrophobic interactions and close packing; hydrogen bonding has little effect • Beta-sheets stabilized by non-polar interactions between residues on adjacent strands • Work supports idea that SSEs coded for locally in the sequence

  10. Folding Pathways • SSEs can change conformation in the presence of a relatively small number of tertiary interactions • Free-energy difference between alpha-helix, beta-sheet, and coil is not great • Individual helices can be changed into beta-sheets by changing just a few amino acids • This suggests that proteins have a “structural plasticity” which allows for changes in conformation

  11. Folding Pathways • Early in folding processes, many different combinations of SSEs have very similar stabilities • In the end, it is the tertiary interactions which drive towards the native topology • Early in folding, “flickering” of SSEs, eventually stabilized by tertiary interactions and converge to native state • Suggests that multiple folding pathways exist, which can all lead to the same end result once stabilized

  12. Structure Prediction • Recently, a split has been seen • Protein prediction problem • Trying to predict the end result of folding, using a large amount of comparison between known and unknown structures • Protein folding problem • Trying to understand the folding path which leads to the end result of folding, typically by MD simulations or energy calculation • Authors contention that both areas will need to be used together to fully understand protein folding

  13. PrISM • Yang and Honig, 1999 • Software suite which integrates prediction based on simulations and known information about structures • Sequence analysis • Structure based sequence alignment • Fast structure-structure superposition using a structural domain database • Multiple Structure alignment • Fold recognition and homology model building • Used to make predictions for all 43 targets of CASP3 conference (more on CASP later)

  14. Conclusions • Much of the current understanding of protein folding was theorized long ago • Vague and speculative ideas have been replaced by carefully defined theoretical concepts and rigorous experimental observations

  15. Conclusions • Polypeptide backbone is the most important determinant of structure • SSEs are “meta-stable”; statement that sequence determines structure not wholly accurate • More accurate statement is that sequence chooses from a limited set of available SSEs and determines how they are ordered in space

  16. Conclusions • Free-energy differences between alternate conformations is not large: may provide a bases for rapid evolutionary change

  17. CASP • A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, John Moult • CASP = Critical Assessment of Structure Prediction • First held in 1994, every 2 years afterwards • Teams make structure predictions from sequences alone

  18. CASP • Two categories of predictors • Automated • Automatic Servers, must complete analysis within 48 hours • Shows what is possible through computer analysis alone • Non-automated • Groups spend considerable time and effort on each target • Utilize computer techniques and human analysis techniques

  19. CASP • CASP6, 1994 • 200 prediction teams from 24 countries • Over 30,000 predictions for 64 protein targets collected and evaluated • Conference held after to discuss results, with many teams presenting individual results and methodologies • Helps to steer future work

  20. Modeling classes • Comparative modeling based on a clear sequence relationship • Modeling based on more distant evolutionary relationships • Modeling based on non-homologous fold relationships • Template free modeling

  21. Comparative modeling based on a clear sequence relationship • Easily detectable sequence relationship between the target protein and one or more known protein structures, typically through BLAST • Copy from template, however: • Must align target and template sequences • In general, reliably building regions not present in the template is still a challenge • Sidechain accuracy is poor • Refinement remains a challenge

  22. Comparative modeling based on a clear sequence relationship • Progress in MD needed for refinement • Models useful for identifying which members of a protein family have similar functionalities, and which are different

  23. Modeling based on more distant evolutionary relationships • Makes use of PSI-BLAST and hidden Markov models • Compile a profile for the sequence, compare this profile to other known profiles • Allows for prediction of structures, even when sequence is not close • Use of metaservers to find consensus structures between CASP4 and CASP5 has led to improved accuracy

  24. Modeling based on more distant evolutionary relationships • Limitations: • Correct template may not be identified • Alignment of target sequence to template is not trivial • Significant fraction of residues will have no structural equivalent in the template; modeling of these regions is hit or miss • Although regions are similar, they are not identical, and the greater the difference, the higher the error • Details are thus not accurate, but overall structure can be useful • For improvements, must work together with template-free methodologies

  25. Modeling based on more distant evolutionary relationships

  26. Modeling based on non-homologous fold relationships • Protein “threading” • In recent CASP experiments, these methods have not been competitive with template free models

  27. Template-free Modeling • For sequences where no template is available • Historically physics based approaches were used • Newer methods focus on substructures • While we have not seen all folds, we have probably seen nearly all substructures • Make use of substructure relationships • From a few residues through SSEs to super-secondary structures

  28. Template-free Modeling • Range of possible conformations and considered • Most successful package has been ROSETTA • For proteins less than ~100 residues, produce one or several approximately correct structures (4-6 A rmsd for C-alpha atoms) • Selecting the most accurate structures from all possibilities is still to be solved, typically make use of clustering currently • Development of atomic models is crucial to further progress

  29. Template-free Modeling

  30. CASP Progress

More Related