1 / 71

Proteiinianalyysi 5

Proteiinianalyysi 5. Rakenteen ennustaminen Funktion ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2005/proteiinianalyysi/. Sekvenssist ä rakenteeseen. komparatiivinen mallitus 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

Download Presentation

Proteiinianalyysi 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteiinianalyysi 5 Rakenteen ennustaminen Funktion ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2005/proteiinianalyysi/

  2. Sekvenssistä rakenteeseen • komparatiivinen mallitus • 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä • 3-ulotteisen rakenteen tunnistaminen annetusta kirjastosta (fold recognition) • 3-ulotteisen rakenteen ennustaminen ab initio • ROSETTA

  3. The “Folding Problem” Two parts: (1) The “Search Problem” Is the true structure one of my 2 million guesses? Fragment assembly (2) The “Discrimination Problem” If it’s one of these 2 million, which one is it? Empirical pseudopotential

  4. Rosetta (1) A stone with three ancient languages on it. (2) A program (David Baker) that simulates the folding of a protein, using statistical energies and moves.

  5. Fold prediction – Rosetta method • Knowledge based scoring function Bayes' law: P(structure) * P(sequence|structure) P(structure|sequence) = P(sequence) P(sequence|structure) = f(residue contacts in native structures) near-native structures protein-likestructures sequence consistentlocal structure P(structure) = probability of a protein-like structure (no clashes, globular shape) Simons et al. (1997)

  6. Collection of putative backbone conformations Protein sequence Library of small segments ... ... ... For each window of 9 residues: lookup 25 closest (sequence) neighbours in library sequences structures Simons et al. (1997)

  7. Intermediates are not observed, but Folding is 2-state Unfolded Folded

  8. Nucleation sites something happens first...

  9. Early folding events might be recorded in the database Short, recurrent sequence patterns could be folding Initiation sites recurrent part HDFPIEGGDSPMQTIFFWSNANAKLSHGY CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYAESAELSPVVNFLEEMQTIFFISGFTQTANSD INWGSMQTIFFEEWQLMNVMDKIPSIFNESKKKGIAMQTIFFILSGR PPPMQTIFFVIVNYNESKHALWCSVD PWMWNLMQTIFFISQQVIEIPSMQTIFFVFSHDEQMKLKGLKGA Non-homologous proteins Nature has selected for these patterns because they speed folding.

  10. Type-I hairpin diverging type-2 turn Serine hairpin Frayed helix alpha-alpha corner glycine helix N-cap Proline helix C-cap I-sites motifs Backbone angles: y=green, f=red Amino acids arranged from non-polar to polar

  11. Rosetta Fragment insertion Monte Carlo backbone torsion angles accept or reject moveset Energy function Choose fragment from moveset change backbone angles Convert angles to 3D coordinates

  12. Rosetta Backbone angles are restrained in I-sites regions regions of high-confidence I-sites prediction backbone torsion angles moveset Fragments that deviate from the paradigm (>90° in f or y) are removed from the moveset. Generally, about one-third of the sequence has an I-sites prediction with confidence > 0.75, and is restrained.

  13. Rosetta Sequence dependent features

  14. Rosetta vector representation Sequence-independent features Probabilities from the database Current structure The energy score for a contact between secondary structures is summed using database statistics.

  15. MC-SA optimization • for each random position • pick a random neighbour • replace backbone conformation • calculate probability of new structure • MC: Monte-Carlo • accept up-hill moves with a certain probability that depends on temperature • SA: simulated annealing • Gradual cooling of temperature: first allow many changes, later fewer changes Simons et al. (1997)

  16. Results • Small molecules: ok • Proteins with mostly α-helices: ok • Proteins with mostly β-sheets: not so ok Simons et al. (1997)

  17. Rosetta What needs to be fixed? Turns 8% of the residues in the targets have f > 0. 44% of these are at Glycine residues. 7% of the residues in the predictions have f > 0. but only 16% of these are at Glycines. Contact order True structure: 0.252 Predictions: 0.119

  18. Prediction algorithms have underlying principles Darwin = protein evolution. Principle: Proteins that evolved from common ancestor have the same fold. Boltzmann = protein folding Principle: Proteins search conformational space, minimizing the free energy (empirical pseudo-potential)

  19. Geenin funktion määrittäminen • fenotyyppi • biokemiallinen aktiivisuus (in vitro) • ilmentyminen • GO, Gene Ontology • molekulaarinen funktio • biologinen prosessi • solunsisäinen lokalisaatio

  20. Homologia  sama funktio? Paralogia: geenien kahdentumisen tulos Vaihtoehtoinen silmukointi: yksi geeni, monta proteiinia Pleiotropia: yksi geeni, monta funktiota Redundanssi: yksi funktio, monta geeniä Heteromeria: kompleksien muodostus “Crosstalk”: signalointireitit vaikuttavat toisiinsa

  21. Protein functional shifts are common • COG0044 • Dihydroorotase • CAD (fusion protein) • Dihydropyriminidase • D-hydantoinase • Allantoinase • Rudimentary protein (involved in developmental programs)

  22. COG0044 functions Urease superfamily functions

  23. Fast evolution ~ functional shift rat lung isoform rat liver isoform, functional shift CYP2 family (cytochrome P450)

  24. “Druggable genome” • Property filters • Likelihood of functional shift • Degree and nature of paralogy • Factors reflecting pleiotropy • Size • Breadth of expression • Interaction potential • Evolutionary rates

  25. Funktion siirto • Nearest neighbour (lähin homologi) • esim. Blast-haku • Fylogeneettinen lähin naapuri • Post-genomiset menetelmät • riippumattomia homologiasta • Proteiini-proteiini-interaktioiden vertailu • Guilt By Association • Hahmontunnistus

  26. Funktion siirto • Hypoteettinen sekvenssi  funktio? • Karakterisoitu homologi • Blast / PSI-Blast • Fylogenia! • evoluutionopeus riippuu perheestä • monen sekvenssin linjaus • Virheelliset funktion määritykset kertautuvat tietokannoissa! • Väärä funktio • liittyy domeeniin, jota ei esiinny hakusekvenssissä • Väärä homologiapäätelmä • Liian yksityiskohtainen funktion kuvaus • funktion muuttuminen evoluutiossa • biokemiallinen vs. fysiologinen funktio • esim. eukaryoottispesifiset funktiot eivät voi esiintyä bakteerissa • Sekvenssilinjaus • funktionaalisten aminohappojen säilyminen • esimerkki: • atratsiiniklorohydrolaasi vs. melamiinideaminaasi: 4 mutaatiota (98 % identtisyys) • Esim. GO liputtaa funktion määrityksen lähteen

  27. Guilt by association • Prediction of subcellular localization based on classification of neighbours

  28. Query pattern Interactome Non-homology protein identificationusing network context Ref: Lappe M, Park J, Niggemann O, Holm L (2001) Bioinformatics Suppl 1, S149-S156

  29. Natural selection • Functional coupling leads to correlations • E.g. co-occurrence of sets of genes in species • Residues required for molecular function • Functional conservation above general sequence divergence of a family

  30. Pancreatic trypsin inhibitor (2ptc)

  31. Approaches • Evolutionary Trace • Lichtarge et al. 1996 • Sequence Space • Casari et al. 1995 • Ortholog / paralog discriminants • Mirny & Gelfand 2003

  32. Evolutionary Trace • The branchpoints separating subclades of a phylogenetic tree can specify molecular speciation events, and hence evolutionary selection of amino acids • Map trace residues to 3D structures

  33. Evaluation of Evolutionary Trace • Trace residues determined at many ranks • Trace residue sets are nested • Test of significance of trace residue at any rank • Overlap with otherwise defined functional sites • Bound ligands in 3D structures (~20 residues) • Annotated sites (~4 residues)

  34. ET assessment • Detects 3D clusters • Manual filtering and pruning of the data • Decide which subclades of the protein family to use in analysis • Exclude fragments • Original method was based on strict invariance within subclade • Automatic implementations • But manually optimized traces score higher

  35. Sequence Space • Aligned protein sequences represented as vectors in a high-dimensional space • Each amino acid type at each column of the MSA is a unique point in Sequence Space • Dimension reduction by Principal Components Analysis • Cluster proteins • Based on their sequence identity • Map residues in the same space • Direction points to association with protein group

  36. A 3D object

  37. PCA projection of the 3D object New axes are linear combination of original axes

  38. Coding of amino acids

  39. Sequence vector representation

  40. Interpretation 1st axis represents the whole family 2nd, 3rd , …, 6th axes represent subclassifications Subfamily-specific residues are found at the tips of a polygon Common residues shared by several subfamilies are found along the edges of a polygon Many unspecific residues at origin

  41. Protein clustering

  42. Residue clustering

  43. Selection of residues & proteins

  44. Ortologit ja paralogit Malliorganismien käyttö: identtinen fysiologia?

More Related