1 / 38

Protein Secondary Structures

Assignment and prediction. Protein Secondary Structures. Pernille Andersen 23.04.2007. Outline. What is protein secondary structure How can it be used? Different prediction methods Alignment to homologues Propensity methods Neural networks Evaluation of prediction methods

agnes
Download Presentation

Protein Secondary Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assignment and prediction Protein Secondary Structures Pernille Andersen 23.04.2007

  2. Outline • What is protein secondary structure • How can it be used? • Different prediction methods • Alignment to homologues • Propensity methods • Neural networks • Evaluation of prediction methods • Links to prediction servers

  3. Helix Bend Turn Secondary Structure Elements ß-strand

  4. Use of secondary structure • Classification of protein structures • Definition of loops (active sites) • Use in fold recognition methods • Improvements of alignments • Definition of domain boundaries • Input for a number of alterntive bioinformatics tools

  5. Classification of secondary structure • Defining features • Dihedral angles • Hydrogen bonds • Geometry • Assigned manually by crystallographers or • Automatic • DSSP (Kabsch & Sander,1983) • STRIDE (Frishman & Argos, 1995) • DSSPcont (Andersen et al., 2002)

  6. From http://www.imb-jena.de phi - dihedral angle of the N-Calpha bond psi - dihedral angle of the Calpha-C bond omega - dihedral angle of the C-N (peptide) bond Dihedral Angles

  7. Helices phi(deg) psi(deg) H-bond pattern ----------------------------------------------------------- alpha-helix -57.8 -47.0 i+4 pi-helix -57.1 -69.7 i+5 310 helix -74.0 -4.0 i+3 (omega = 180 deg ) From http://www.imb-jena.de

  8. phi(deg) psi(deg) omega (deg) ------------------------------------------------------------------ beta strand -120 120 180 From http://broccoli.mfn.ki.se/pps_course_96/ Beta Strands Antiparallel Parallel

  9. Helix Bend Turn Secondary Structure Elements ß-strand

  10. * H = alpha helix * G = 310 - helix * I = 5 helix (pi helix) * E = extended strand, participates in beta ladder * B = residue in isolated beta-bridge * T = hydrogen bonded turn * S = bend * C = coil Secondary Structure Type Descriptions

  11. Automatic assignment programs • DSSP ( http://www.cmbi.kun.nl/gv/dssp/ ) • STRIDE (http://bioweb.pasteur.fr/seqanal/interfaces/stride.html) • DSSPcont ( http://cubic.bioc.columbia.edu/services/DSSPcont/ ) • The protein data bank visualizes DSSP assignments on structures in the data base (go to sequence details tab) # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 4 A E 0 0 205 0, 0.0 2,-0.3 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 113.5 5.7 42.2 25.1 2 5 A H - 0 0 127 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -0.987 360.0-152.8-149.1 154.0 9.4 41.3 24.7 3 6 A V - 0 0 66 -2,-0.3 21,-2.6 2, 0.0 2,-0.5 -0.995 4.6-170.2-134.3 126.3 11.5 38.4 23.5 4 7 A I E -A 23 0A 106 -2,-0.4 2,-0.4 19,-0.2 19,-0.2 -0.976 13.9-170.8-114.8 126.6 15.0 37.6 24.5 5 8 A I E -A 22 0A 74 17,-2.8 17,-2.8 -2,-0.5 2,-0.9 -0.972 20.8-158.4-125.4 129.1 16.6 34.9 22.4 6 9 A Q E -A 21 0A 86 -2,-0.4 2,-0.4 15,-0.2 15,-0.2 -0.910 29.5-170.4 -98.9 106.4 19.9 33.0 23.0 7 10 A A E +A 20 0A 18 13,-2.5 13,-2.5 -2,-0.9 2,-0.3 -0.852 11.5 172.8-108.1 141.7 20.7 31.8 19.5 8 11 A E E +A 19 0A 63 -2,-0.4 2,-0.3 11,-0.2 11,-0.2 -0.933 4.4 175.4-139.1 156.9 23.4 29.4 18.4 9 12 A F E -A 18 0A 31 9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -0.967 13.3-160.9-160.6 151.3 24.4 27.6 15.3 10 13 A Y E -A 17 0A 36 -2,-0.3 2,-0.4 7,-0.2 7,-0.2 -0.994 16.5-156.0-136.8 132.1 27.2 25.3 14.1 11 14 A L E >> -A 16 0A 24 5,-3.2 4,-1.7 -2,-0.4 5,-1.3 -0.929 11.7-122.6-120.0 133.5 28.0 24.8 10.4 12 15 A N T 45S+ 0 0 54 -2,-0.4 -2, 0.0 2,-0.2 0, 0.0 -0.884 84.3 9.0-113.8 150.9 29.7 22.0 8.6 13 16 A P T 45S+ 0 0 114 0, 0.0 -1,-0.2 0, 0.0 -2, 0.0 -0.963 125.4 60.5 -86.5 8.5 32.0 21.6 6.8 14 17 A D T 45S- 0 0 66 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 0.752 89.3-146.2 -64.6 -23.0 33.0 25.2 7.6

  12. Q3 H E C Secondary Structure Prediction • What to predict? • All 8 types or pool types into groups DSSP * H = alpha helix * G = 310 -helix * I = 5 helix (pi helix) * E = extended strand * B = beta-bridge * T = hydrogen bonded turn * S = bend * C = coil

  13. Q3 H E C Secondary Structure Prediction Straight HEC • What to predict? • All 8 types or pool types into groups * H = alpha helix * E = extended strand * T = hydrogen bonded turn * S = bend * C = coil * G = 310-helix * I = 5 helix (pi helix) * B = beta-bridge

  14. Secondary Structure Prediction • Simple alignments • Align to a close homolog for which the structure has been experimentally solved. • Heuristic Methods (e.g., Chou-Fasman, 1974) • Apply scores for each amino acid an sum up over a window. • Neural Networks • Raw Sequence (late 80’s) • Blosum matrix (e.g., PhD, early 90’s) • Position specific alignment profiles (e.g., PsiPred, late 90’s) • Multiple networks balloting, probability conversion, output expansion (Petersen et al., 2000).

  15. 1974 Chou & Fasman ~50-53% 1978 Garnier 63% 1987 Zvelebil 66% 1988 Quian & Sejnowski 64.3% 1993 Rost & Sander 70.8-72.0% 1997 Frishman & Argos <75% 1999 Cuff & Barton 72.9% 1999 Jones 76.5% 2000 Petersen et al. 77.9% Improvement of accuracy

  16. Simple Alignments • Solved structure of a homolog to query is needed • Homologous proteins have ~88% identical (3 state) secondary structure • If no close homologue can be identified alignments will give almost random results

  17. Propensities: Amino acid preferences in -Helix

  18. Propensities: Amino acid preferences in -Strand

  19. Propensities: Amino acid preferences in coil

  20. Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3) Ala 142 83 66 0.06 0.076 0.035 0.058 Arg 98 93 95 0.070 0.106 0.099 0.085 Asp 101 54 146 0.147 0.110 0.179 0.081 Asn 67 89 156 0.161 0.083 0.191 0.091 Cys 70 119 119 0.149 0.050 0.117 0.128 Glu 151 37 74 0.056 0.060 0.077 0.064 Gln 111 110 98 0.074 0.098 0.037 0.098 Gly 57 75 156 0.102 0.085 0.190 0.152 His 100 87 95 0.140 0.047 0.093 0.054 Ile 108 160 47 0.043 0.034 0.013 0.056 Leu 121 130 59 0.061 0.025 0.036 0.070 Lys 114 74 101 0.055 0.115 0.072 0.095 Met 145 105 60 0.068 0.082 0.014 0.055 Phe 113 138 60 0.059 0.041 0.065 0.065 Pro 57 55 152 0.102 0.301 0.034 0.068 Ser 77 75 143 0.120 0.139 0.125 0.106 Thr 83 119 96 0.086 0.108 0.065 0.079 Trp 108 137 96 0.077 0.013 0.064 0.167 Tyr 69 147 114 0.082 0.065 0.114 0.125 Val 106 170 50 0.062 0.048 0.028 0.053 Chou-Fasman propensities

  21. Chou-Fasman • Generally applicable • Works for sequences with no solved homologs • But the accuracy is low! • The problem is that the method does not use enough information about the structural context of a residue

  22. Neural Networks • Benefits • Generally applicable • Can capture higher order correlations • Inputs other than sequence information • Drawbacks • Needs a high amount of data (different solved structures). However, today nearly 7000 structures with low sequence identity/high resolution are solved • Complex method with several pitfalls

  23. Weights Input Layer I K H E Output Layer E E C H V I I Q A E Hidden Layer Window IKEEHVIIQAEFYLNPDQSGEF….. Architecture

  24. Sparse encoding Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AAcid A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

  25. 0 0 0 0 I 0 K 0 E 1 E 0 H 0 V 0 I 0 I 0 Q 0 A 0 E 0 0 0 0 0 0 Input Layer

  26. A R N D C Q E G H I L K M F P S T W Y V B Z X * A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4 BLOSUM 62

  27. -1 0 0 I 2 -4 K E 2 E 5 -2 H V 0 -3 I -3 I Q 1 -2 A -3 E -1 0 -1 -3 -2 -2 Input Layer

  28. Weights Input Layer H E H C Output Layer E H C E C H E C Window Hidden Layer IKEEHVIIQAEFYLNPDQSGEF….. Secondary networks(Structure-to-Structure)

  29. PHD method (Rost and Sander) • Combine neural networks with sequence profiles • 6-8 Percentage points increase in prediction accuracy over standard neural networks • Use second layer “Structure to structure” network to filter predictions • Jury of predictors • Set up as mail server

  30. PSI-Pred (Jones) • Use alignments from iterative sequence searches (PSI-Blast) as input to a neural network • Better predictions due to better sequence profiles • Available as stand alone program and via the web

  31. A R N D C Q E G H I L K M F P S T W Y V 1 I -2 -4 -5 -5 -2 -4 -4 -5 -5 6 0 -4 0 -2 -4 -4 -2 -4 -3 4 2 K -1 -1 -2 -2 -3 -1 3 -3 -2 -2 -3 4 -2 -4 -3 1 1 -4 -3 2 3 E 5 -3 -3 -3 -3 3 1 -2 -3 -3 -3 -2 -2 -4 -3 -1 -2 -4 -3 1 4 E -4 -3 2 5 -6 1 5 -4 -3 -6 -6 -2 -5 -6 -4 -2 -3 -6 -5 -5 5 H -4 2 1 1 -5 1 -2 -4 9 -5 -2 -3 -4 -4 -5 -3 -4 -5 1 -5 6 V -3 0 -4 -5 -4 -4 -2 -3 -5 1 -2 1 0 1 -4 -3 3 -5 -3 5 7 I 0 -2 -4 1 -4 -2 -4 -4 -5 1 0 -2 0 2 -5 1 -1 -5 -3 4 8 I -3 0 -5 -5 -4 -2 -5 -6 1 2 4 -4 -1 0 -5 -2 0 -3 5 -1 9 Q -2 -3 -2 -3 -5 4 -1 3 5 -5 -3 -3 -4 -2 -4 2 -1 -4 2 -2 10 A 2 -4 -4 -3 2 -3 -1 -4 -2 1 -1 -4 -3 -4 1 2 3 -5 -1 1 11 E -1 3 1 1 -1 0 1 -4 -3 -1 -3 0 3 -5 4 -1 -3 -6 -3 -1 12 F -3 -5 -5 -5 -4 -4 -4 -1 -1 1 1 -5 2 5 -1 -4 -4 -3 5 2 13 Y 3 -5 -5 -6 3 -4 -5 -2 -1 0 -4 -5 -3 3 -5 -2 -2 -2 7 1 14 L -1 -3 -4 -2 1 5 1 -1 -1 -1 1 -3 -3 1 -5 -1 -1 -2 3 -2 15 N -1 -4 4 1 5 -3 -4 2 -4 -4 -4 -3 -2 -4 -5 2 0 -5 0 0 16 P -2 4 -4 -4 -5 0 -3 3 2 -5 -4 0 -4 -3 0 1 -2 -1 5 -3 17 D -3 -2 1 5 -6 -2 2 2 -1 -2 -2 -3 -5 -4 -5 -1 2 -6 -3 -4 Position specific scoring matrices (PSI-BLAST profiles)

  32. Sequence-to-structure • Window sizes 15,17,19 and 21 • Hidden units 50 and 75 • 10-fold cross validation => 80 predictions • Structure-to-structure • Window size 17 • Hidden units 40 • 10-fold cross validation => 800 predictions Several different architectures Output: C C H H C C C Output: C C C C C C C

  33. Combining predictions from several networks improves the prediction • Combinations of 800 different networks were used in the method described by • Petersen TN et al. 2000, Prediction of protein secondary structure at 80 % accuracy. Proteins 41 17-20 The majority rules

  34. Helix activities (output) Strand activities (output) Coil probabilities! (calculated) Coil conversion 0.05 0.1 0.15 … 1.0 0.05 0.99 0.10 0.15 0.9 0.83 0.75 . . . 1.0 Activities to probabilities

  35. Benchmarking secondary structure predictions • EVA • Newly solved structures are send to prediction servers. • Every week http://cubic.bioc.columbia.edu/eva/sec/res_sec.html

  36. EVA results (Rost et al., 2001) • PROFphd 77.0% • PSIPRED 76.8% • SAM-T99sec 76.1% • SSpro 76.0% • Jpred2 75.5% • PHD 71.7% • Cubic.columbia.edu/eva

  37. Links to servers • Several links: http://cubic.bioc.columbia.edu/eva/doc/explain_methods.html#type_sec • ProfPHD http://www.predictprotein.org/ • PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/ • JPred http://www.compbio.dundee.ac.uk/~www-jpred/ • SAM T02 http://www.cse.ucsc.edu/research/compbio/HMM-apps/T02-query.html

  38. Practical Conclusions • If you need a secondary structure prediction use the newer methods based on advanced machine learning methods such as : • ProfPHD • PSIPRED • JPred • SAM T02 • And not one of the older ones such as : • Chou-Fasman • Garnier

More Related