170 likes | 268 Views
Chapter 15. Structure Prediction: Threading. Motivation. Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want accuracy You could use nucleotide alignment, but what do you do with the gapped regions?
E N D
Chapter 15 Structure Prediction: Threading
Motivation • Given a protein, can you predict molecular structure • Want to avoid repeated x-ray crystallography, but want accuracy • You could use nucleotide alignment, but what do you do with the gapped regions? • More complex methods are only justified if they can be shown to perform better than simpler methods • Simpler methods are only justified if they can perform better than basic sequence alignment
First Step • Some structure comparison methods use secondary structures of the new sequence • Predict location of secondary structure elements along the protein’s backbone and the degree of residue burial • Supervised learning has been shown to perform well in this task
Artificial Neural Network Predicts Structure at this point
Danger • You may train the network on your training set, but it may not generalize to other data • Perhaps we should train several ANNs and then let them vote on the structure
Profile network from HeiDelberg • family (alignment is used as input) instead of just the new sequence • On the first level, a window of length 13 around the residue is used • The window slides down the sequence, making a prediction for each residue • The input includes the frequency of amino acids occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment) • The second level takes these predictions from neural networks that are centered on neighboring proteins • The third level does a jury selection
PHD Predicts 4 Predicts 5 Predicts 6
Threading • Threading matches structure to sequence • True threading considers 3D spatial interactions
3D-1D Matching (Bowie et al.) • Convert 3D structure into a string • Include -helix, -sheet or neither • Include buried or solvent accessible (6 levels) • Total of 3X6=18 distinct states • With Pa:j= probability of finding amino acid (a) in environment (j) and Pa=probability of finding (a) anywhere
3D-1D • Calculate the information values score on a training set of multiple alignments and the score was used as a profile for each column • When applied to the globin family an clearly identified myoglobins from nonglobins but not from other globins
Methods using 3D interactions • Residues that have large separation in the sequence may end up next to each other when the protein is folded. • Define a measure of contact between residues (two atoms within 5Å) and count frequency of contact between all pairs in PDB • Use measure in alignment to evaluate cost, or to select the best alignment
Potentials of mean force (POMF) • Since the notion of contact is somewhat arbitrary, a more general formulation can be tried • Derive an empirical function for the propensity of each of the 400 pairs of residues to be any given distance apart.
Multiple Sequence Threading • Multiple Sequence Alignment • Align the most similar to create a consensus sequence • Align consensus sequences to create overall alignment • Use the same strategy with structures • Assume that conserved hydrophobic positions should pack in the core • This appears to be work in progress (1997)
Example • Two small hydrophobic residues alanine (A) and valine (V), both of which favor packing in the core of the protein. • The POMF would have a peak around 5A • Aspartate (D) and valine since do not often pack together • The POMF will have a dip around 5A POMF(A,V) Probability Distance 5A POMF(D,V) Probability Distance 5A
Sequence-Structure Alignment • For all know structures • Align the unknown sequence to that structure • Find the best alignment • Return the structure with the best global alignment • Unfortunately, we cant use dynamic programming (NP Complete) • Heuristics must be used to explore the space.
Evaluating Methods • Is the complexity worth it? • This is difficult without a benchmark • Few comparative studies have been performed • When they have been performed, authors of competing methods have complained that wrong parameters were used … • Critical Assessment of Structure Prediction (CASP 1994) releases protein structures prior to publication. • All methods submit their predictions • Predictions are analyzed based on fold recognition, modeling accuracy and alignment accuracy. • No one method or approach is obviously superior