1 / 78

Secondary Structure Prediction

Protein Analysis Workshop 2006. Secondary Structure Prediction. Alain Schenkel Chris Wilton. Bioinformatics group Institute of Biotechnology University of helsinki. Overview. Review of protein structure. Introduction to structure prediction: Different approaches.

kristy
Download Presentation

Secondary Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProteinAnalysisWorkshop 2006 Secondary Structure Prediction Alain Schenkel Chris Wilton Bioinformatics group Institute of Biotechnology University of helsinki

  2. Overview • Review of protein structure. • Introduction to structure prediction: • Different approaches. • Prediction of 1D strings of structural elements. • Server/soft review: • COILS, MPEx, … • The PredictProtein metaserver.

  3. Proteins • Proteins play a crucial role in virtually all biological processes with a broad range of functions. • The activity of an enzyme or the function of a protein is governed by the three-dimensional structure. H11_MOUSE histocompatibility antigen VE2_BPV1 Bovine DNA-binding domain

  4. 20 amino acids - the building blocks Clickable map at: http://www.russell.embl-heidelberg.de/aas/

  5. The Amino Acids - hydrophobic

  6. The Amino Acids - polar

  7. The Amino Acids - charged

  8. Secondary Structure:a-helix Alpha-helix: 413 Very seldom: 310, 516 (Pi-helix)

  9. Secondary Structure:a-helix • 3.6 residues per turn • Axial dipole moment • Hydrogen-bonded • Protein surfaces • Typically, no Proline nor Glycine (“helix-breaker”)

  10. Secondary Structure:b-sheets

  11. Secondary Structure:b-sheets • Parallel or antiparallel • Alternating side-chains • Connecting loops often have polar amino acids

  12. Secondary Structure: b-sheets

  13. Terminology • Primary structure: The sequence of amino acid residues FTPAVHAFLDKFLAS …

  14. Terminology • Secondary structure: • A first level of structural organization. • Provides rigidity. • The structural form adopted by each amino-acid residue: • H: helix ( alpha ) • E: extended ( beta strand ) • T: turn ( often Proline ) • C: coil ( random, unstructured )

  15. Terminology • Secondary structure elements (SSE): • Stretches of residues in H conformation are helical SSEs. • Stretches of residues in E conformation are beta-strand SSEs. • Stretches of residues in C conformation are loops or coil. • Turns (T) are isolated residues, usually Proline or Glycine. • Other notation (in 3 states): L for all but H,E.

  16. Secondary Structure Elements • Example: one helix, one beta strand, three loops Primary: MSEGEDDFPRKRTPWCFDDEHMC Secondary: CCHHHHHHCCCCEEEEEECCCCC

  17. Terminology • Tertiary structure: • The full 3D structure of a single polypeptide chain. • Secondary structure elements pack together to form a structural core. • Called a protein “fold”.

  18. Terminology • How several fully folded protein chains pack together to form a fully functional protein. • Example: 1jch (ribosome inhibitor). • Quaternary structure: PDB identifier The Protein Data Bank is the principal repository for solved structures.

  19. Example: 1jch has 4 chains The elongated 2-helix structures in the center are called coiled-coils.

  20. Structural classification of folds For example (CATH): • alpha • beta • alpha+beta • alpha/beta • irregular More on structural classification next week.

  21. Biochemical classification of folds • Globular proteins: • in aqueous environment, • compact fold, • hydrophobic core and polar surfaces. • Membrane proteins: • attached to or across the cell membrane, • hydrophobic surface within membrane. • Fibrous proteins: • structural role, • repeat of regular/atypical SSE or irregular structure.

  22. Globular (2 domains) Transmembrane Fibrous

  23. INTRODUCTION TO STRUCTURE PREDICTION

  24. Why is 3D Structure Important? • A pre-requisite for understanding function • processes of molecular recognition, • eg DNA recognition by 2bop. • Catalytic mechanisms of enzymes • often require key residues to be close together in 3D space. • Structure is often preserved under evolution when sequence is not. • Drug design.

  25. Structure Prediction GPSRYIVDL… ?

  26. Approaches to structure prediction • Ab initio: fromphysical principles only. • De novo: knowledge-based potentials from PDB. • Fold recognition: thread sequence through known structures for compatibility. • Homology modeling: use sequence alignment to infer possible templatestructure. More on homology modeling next week.

  27. Prediction in One-Dimension Simplification: project 3D structure onto strings of structural assignments. Eg: • coiled-coils • membrane helices • solvent accessibility: residue is buried or exposed …eeebbbbeebbbbee… • secondary structure elements: …HHHLLLEEEEEELLEEE… If accurate: can be used to improve predictions of 3D structures (eg, in fold recognition).

  28. A Flow Chart for Structure Prediction http://speedy.embl-heidelberg.de/gtsp/flowchart2.html

  29. Structure Prediction Why is structure prediction, and in particular ab initio prediction, a difficult problem? • Many degrees of freedom: atoms of all residues and solvent. • Problem increases exponentially per residue. • Remote noncovalent interactions complicate matters. • A delicate problem of stability. • Cannot exhaustively search all possible conformations. A folding protein does not try all conformations !! (Levinthal paradox)

  30. Basic Principle of Folding (globular protein) Pack hydrophobic side chains into the interior of the molecule, away from solvent. So, • Hydrophobic residues predominantly within a central structural core. Tight packing (crystal-like). • Hydrophilic residues predominantly on the protein surface, exposed to solvent. But main chain is highly polar. This forces the formation of SSEs in the core. So, • Core residues tend to be in SSEs. • Loops are on the outside of the protein.

  31. Protein Structure and Evolution • Rate of evolution of genomic DNA sequence reflects degree of functional constraint. • Protein coding regions evolve much more slowly than non-coding regions: • need to maintain stable 3D protein structure, • need to maintain vital biological function.

  32. Rates of Protein Sequence Evolution • Sequences of highly constrained structures evolve very slowly (eg: histones). • Less constrained ones evolve more quickly (eg: immunoglobulins). • In general: response to mutation is structural change, but many mutations will not (or only slightly) change the structure => Structure is better conserved than sequence.

  33. Evolution of SSEs and Loops • Residues in the hydrophobic core (SSEs) are constrained by the need for tight packing: • changes rarely accepted - evolution is slow. • Residues on the surface (loops) are less constrained (simply need to be hydrophilic): • aa substitution less restricted – evolution is quicker.

  34. Evolution of Key Residues • Residues with key functional roles will be conserved. • Eg: active site residues involved in catalysis. • BUT: gene duplication can lead to change of function without changing structure. • Residues with key structural role also tend to be conserved. Eg: • GLY: high conformational flexibility => tight turns,… • PRO: side-chain bounds back to backbone => tight turns. • CYS: disulfide bridges.

  35. Structure Prediction by Homology Multiple sequence / structure alignments measure differences in evolutionary rates of residues, and thus • Contain more information than a single sequence for applications such as homology modeling and secondary structure prediction, • Give location of conserved regions and motifs, residues buried in the protein core or exposed to solvent, plus important secondary structures. More on homology modeling next week.

  36. Secondary Structure Prediction Three generations: • Single residue statistical analysis: • For each amino acid type, assign its ‘propensity’ to be in a helix, sheet, or coil. • Limited accuracy: ~55-60% on average. • Eg: Chou-Fasman (1974), not used any more.

  37. Secondary Structure Prediction • Segment-based statistics: • Look for correlations (within 11-21 aa windows). • Many algorithms have been tried. • Most performant: Neural Networks: • Input: a number of protein sequences with their known secondary structure. • Output: a trained network that predicts secondary structure elements for given query sequences. • Accuracy < 70%. • Eg: GORII, COMBINE.

  38. Neural Networks 3 states output prediction for this residue prediction query trained network (picture from B.Rost, 1999)

  39. Secondary Structure Prediction • Using information from evolution: • Compute a sequence profile from a multiple sequence alignment. • Use profile instead of query as input to Neural Network. • 6-8 % points increase in accuracy over Neural Network only. • Eg: • PHD/PROF: alignments by MaxHom (B. Rost, 1996/2000) • PSI-PRED: alignments from Psi-Blast (D.T. Jones, 1999) • Accuracy: 72% ± 11%. # of correctly predicted 2ndary str. states Accuracy measured as Q3= total # of residues

  40. Accuracy Illustration Psi-Pred benchmark on set of 187 chains. (D.T. Jones, 1999) Your query could be here !! In particular, accuracy can be as low as 50% for a given query => Use many different methods and compare answers.

  41. Other Structural Features There are other structural features that one can try to predict: • coiled-coils, • membrane helices, • solvent accessibility, • globularity, • disulfide bridges, • confomational switches, • …

  42. POPULAR SERVERS FOR DEALING WITH SECONDARY STRUCTURES Coiled-coils Transmembrane helices Secondary structure Metaservers

  43. Prediction of coiled-coils Coiled-coils are generally solvent exposed multi-stranded helix structures: two-stranded Helix periodicity and solvent exposure impose special pattern of heptad repeat: Helical diagram of 2 interacting helices: … abcdefg … • hydrophobic residues • hydrophilic residues (From Wikipedia Leucine zipper article)

  44. The COILS server at EMBnet • Compares a sequence to a database of known, parallel two-stranded coiled-coils, and derives a similarity score. • By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation. • Options: • scoring matrices, • window size (score may vary), • weighting options.

  45. COILS Limitations • The program works well for parallel two-stranded structures that are solvent-exposed but runs progressively into problems with the addition of more helices, their antiparallel orientation and their decreasing length. • The program fails entirely on buried structures.

  46. COILS Demo Let us submit the sequence >1jch_A VAAPVAFGFPALSTPGAGGLAVSISAGALSAAIADIMAALKGPFKFGLWGVALYGVLPSQ IAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVP MSVPVVDAKPTERPGVFTASIPGAPVLNISVNNSTPAVQTLSPGVTNNTDKDVRPAFGTQ GGNTRDAVIRFPKDSGHNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNY ERARAELNQANEDVARNQERQAKAVQVYNSRKSELDAANKTLADAIAEIKQFNRFAHDPM AGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAKEKSDADAALSSAMESRKKKEDKKRSAE NNLNDEKNKPRKGFKDYGHDYHPAPKTENIKGLGDLKPGIPKTPKQNGGGKRKRWTGDKG RKIYEWDSQHGELEGYRASDGQHLGSFDPKTGNQLKGPDPKRNIKKYL to the COILS server at EMBnet: http://www.ch.embnet.org/software/COILS_form.html

  47. mtidk matrix, no weights, all window lengths

  48. Frame probabilities at each residue. Columns: window size of 14, 21, 28 aa. high probability heptads

  49. Transmembrane Region Prediction Transmembrane regions: • Usually contain residues with hydrophobic side chains (surface must be hydrophobic). • Usually ~20 residues long, can be up to 30 if not perpendicular through membrane. Methods: • Hydropathy plots (historical, better methods now available) • Threading (TMpred, MEMSAT), • Hidden Markov Model (TMHMM), • Neural Network (PHDhtm).

  50. Hydropathy Plots (Kyte-Doolittle) • compute an average hydropathy value for each position in the query sequence, • window length of 19 usually chosen for membrane-spanning region prediction. Peaks between scales 1-2?

More Related