Protein Structure Prediction

Protein Structure Prediction

Historical Perspective • Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 • A personal perspective on advances and developments in protein folding over the last 40 years

Levinthal Paradox • Cyrus Levinthal, Columbia University, 1968 • Observed that there is insufficient time to randomly search the entire conformational space of a protein • Resolution: Proteins have to fold through some directed process • Goal is to understand the dynamics of this process

Old vs. New Views • Old: • Heirarchical view of protein folding • Secondary structures form, then interact to form tertiary structures • General order of events • New: • Statistical ensembles of states • Potential energy landscape • Folding “Funnel” • Not all that different; most important ideas were theorized many years ago

Secondary Structures • Consensus view is that secondary structure formation is the earliest part of the folding process • Numerous studies indicate that local sequence codes for local structures • Helical sequences in a folded protein tend to be helical in isolation • Current SSE prediction algorithms about 70% correct (1993). Failure indicates some tertiary interactions in stabilizing SSEs

However… • Not clear what sequence elements code for overall topology • One factor is the existence of hydrophobic faces on the surface of SSEs • Still challenges in predicting topology of SSEs, even when protein class is known

Atomic level calculations • Molecular calculations have made great impact in our understanding of protein folding • Harold Scheraga, 1968 • Shneior Lifson, 1969 • Martin Karplus’s laboratory, ~1979 • Early calculations had trouble dealing with solvent effects

Secondary Structure • Many of the essential elements of protein energetics can be derived from looking at SSE formation • Early experimental work: Ingwall et all, 1968 • Baldwin et all, 1989, Worked on stabilizing shorter helices • Dyson, Wright, 1991, demonstrated that even short peptides in solution can be partially structured

Results • Yang and Honig, 1995 • Alpha-helices stabilized by hydrophobic interactions and close packing; hydrogen bonding has little effect • Beta-sheets stabilized by non-polar interactions between residues on adjacent strands • Work supports idea that SSEs coded for locally in the sequence

Folding Pathways • SSEs can change conformation in the presence of a relatively small number of tertiary interactions • Free-energy difference between alpha-helix, beta-sheet, and coil is not great • Individual helices can be changed into beta-sheets by changing just a few amino acids • This suggests that proteins have a “structural plasticity” which allows for changes in conformation

Folding Pathways • Early in folding processes, many different combinations of SSEs have very similar stabilities • In the end, it is the tertiary interactions which drive towards the native topology • Early in folding, “flickering” of SSEs, eventually stabilized by tertiary interactions and converge to native state • Suggests that multiple folding pathways exist, which can all lead to the same end result once stabilized

Structure Prediction • Recently, a split has been seen • Protein prediction problem • Trying to predict the end result of folding, using a large amount of comparison between known and unknown structures • Protein folding problem • Trying to understand the folding path which leads to the end result of folding, typically by MD simulations or energy calculation • Authors contention that both areas will need to be used together to fully understand protein folding

PrISM • Yang and Honig, 1999 • Software suite which integrates prediction based on simulations and known information about structures • Sequence analysis • Structure based sequence alignment • Fast structure-structure superposition using a structural domain database • Multiple Structure alignment • Fold recognition and homology model building • Used to make predictions for all 43 targets of CASP3 conference (more on CASP later)

Conclusions • Much of the current understanding of protein folding was theorized long ago • Vague and speculative ideas have been replaced by carefully defined theoretical concepts and rigorous experimental observations

Conclusions • Polypeptide backbone is the most important determinant of structure • SSEs are “meta-stable”; statement that sequence determines structure not wholly accurate • More accurate statement is that sequence chooses from a limited set of available SSEs and determines how they are ordered in space

Conclusions • Free-energy differences between alternate conformations is not large: may provide a bases for rapid evolutionary change

CASP • A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, John Moult • CASP = Critical Assessment of Structure Prediction • First held in 1994, every 2 years afterwards • Teams make structure predictions from sequences alone

CASP • Two categories of predictors • Automated • Automatic Servers, must complete analysis within 48 hours • Shows what is possible through computer analysis alone • Non-automated • Groups spend considerable time and effort on each target • Utilize computer techniques and human analysis techniques

CASP • CASP6, 1994 • 200 prediction teams from 24 countries • Over 30,000 predictions for 64 protein targets collected and evaluated • Conference held after to discuss results, with many teams presenting individual results and methodologies • Helps to steer future work

Modeling classes • Comparative modeling based on a clear sequence relationship • Modeling based on more distant evolutionary relationships • Modeling based on non-homologous fold relationships • Template free modeling

Comparative modeling based on a clear sequence relationship • Easily detectable sequence relationship between the target protein and one or more known protein structures, typically through BLAST • Copy from template, however: • Must align target and template sequences • In general, reliably building regions not present in the template is still a challenge • Sidechain accuracy is poor • Refinement remains a challenge

Comparative modeling based on a clear sequence relationship • Progress in MD needed for refinement • Models useful for identifying which members of a protein family have similar functionalities, and which are different

Modeling based on more distant evolutionary relationships • Makes use of PSI-BLAST and hidden Markov models • Compile a profile for the sequence, compare this profile to other known profiles • Allows for prediction of structures, even when sequence is not close • Use of metaservers to find consensus structures between CASP4 and CASP5 has led to improved accuracy

Modeling based on more distant evolutionary relationships • Limitations: • Correct template may not be identified • Alignment of target sequence to template is not trivial • Significant fraction of residues will have no structural equivalent in the template; modeling of these regions is hit or miss • Although regions are similar, they are not identical, and the greater the difference, the higher the error • Details are thus not accurate, but overall structure can be useful • For improvements, must work together with template-free methodologies

Modeling based on more distant evolutionary relationships

Modeling based on non-homologous fold relationships • Protein “threading” • In recent CASP experiments, these methods have not been competitive with template free models

Template-free Modeling • For sequences where no template is available • Historically physics based approaches were used • Newer methods focus on substructures • While we have not seen all folds, we have probably seen nearly all substructures • Make use of substructure relationships • From a few residues through SSEs to super-secondary structures

Template-free Modeling • Range of possible conformations and considered • Most successful package has been ROSETTA • For proteins less than ~100 residues, produce one or several approximately correct structures (4-6 A rmsd for C-alpha atoms) • Selecting the most accurate structures from all possibilities is still to be solved, typically make use of clustering currently • Development of atomic models is crucial to further progress

Template-free Modeling

CASP Progress

Protein Structure Prediction