1 / 54

Proteiinianalyysi 8

Proteiinianalyysi 8. Funktion ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2006/proteiinianalyysi/. Geenin funktion m äärittäminen. fenotyyppi biokemiallinen aktiivisuus ( in vitro ) ilmentyminen GO, Gene Ontology molekulaarinen funktio

tlong
Download Presentation

Proteiinianalyysi 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteiinianalyysi 8 Funktion ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2006/proteiinianalyysi/

  2. Geenin funktion määrittäminen • fenotyyppi • biokemiallinen aktiivisuus (in vitro) • ilmentyminen • GO, Gene Ontology • molekulaarinen funktio • biologinen prosessi • solunsisäinen lokalisaatio

  3. Homologia  sama funktio? Paralogia: geenien kahdentumisen tulos Vaihtoehtoinen silmukointi: yksi geeni, monta proteiinia Pleiotropia: yksi geeni, monta funktiota Redundanssi: yksi funktio, monta geeniä Heteromeria: kompleksien muodostus “Crosstalk”: signalointireitit vaikuttavat toisiinsa

  4. Protein functional shifts are common • COG0044 • Dihydroorotase • CAD (fusion protein) • Dihydropyriminidase • D-hydantoinase • Allantoinase • Rudimentary protein (involved in developmental programs)

  5. COG0044 functions Urease superfamily functions

  6. Fast evolution ~ functional shift rat lung isoform rat liver isoform, functional shift CYP2 family (cytochrome P450)

  7. “Druggable genome” • Property filters • Likelihood of functional shift • Degree and nature of paralogy • Factors reflecting pleiotropy • Size • Breadth of expression • Interaction potential • Evolutionary rates

  8. Funktion siirto • Nearest neighbour (lähin homologi) • esim. Blast-haku • Fylogeneettinen lähin naapuri • Post-genomiset menetelmät • riippumattomia homologiasta • Proteiini-proteiini-interaktioiden vertailu • Guilt By Association • Hahmontunnistus

  9. Funktion siirto • Hypoteettinen sekvenssi  funktio? • Karakterisoitu homologi • Blast / PSI-Blast • Fylogenia! • evoluutionopeus riippuu perheestä • monen sekvenssin linjaus • Virheelliset funktion määritykset kertautuvat tietokannoissa! • Väärä funktio • liittyy domeeniin, jota ei esiinny hakusekvenssissä • Väärä homologiapäätelmä • Liian yksityiskohtainen funktion kuvaus • funktion muuttuminen evoluutiossa • biokemiallinen vs. fysiologinen funktio • esim. eukaryoottispesifiset funktiot eivät voi esiintyä bakteerissa • Sekvenssilinjaus • funktionaalisten aminohappojen säilyminen • esimerkki: • atratsiiniklorohydrolaasi vs. melamiinideaminaasi: 4 mutaatiota (98 % identtisyys) • Esim. GO liputtaa funktion määrityksen lähteen

  10. Guilt by association • Prediction of subcellular localization based on classification of neighbours

  11. Query pattern Interactome Non-homology protein identificationusing network context Ref: Lappe M, Park J, Niggemann O, Holm L (2001) Bioinformatics Suppl 1, S149-S156

  12. Natural selection • Functional coupling leads to correlations • E.g. co-occurrence of sets of genes in species • Residues required for molecular function • Functional conservation above general sequence divergence of a family

  13. Pancreatic trypsin inhibitor (2ptc)

  14. Approaches • Evolutionary Trace • Lichtarge et al. 1996 • Sequence Space • Casari et al. 1995 • Ortholog / paralog discriminants • Mirny & Gelfand 2003

  15. Evolutionary Trace • The branchpoints separating subclades of a phylogenetic tree can specify molecular speciation events, and hence evolutionary selection of amino acids • Map trace residues to 3D structures

  16. Evaluation of Evolutionary Trace • Trace residues determined at many ranks • Trace residue sets are nested • Test of significance of trace residue at any rank • Overlap with otherwise defined functional sites • Bound ligands in 3D structures (~20 residues) • Annotated sites (~4 residues)

  17. ET assessment • Detects 3D clusters • Manual filtering and pruning of the data • Decide which subclades of the protein family to use in analysis • Exclude fragments • Original method was based on strict invariance within subclade • Automatic implementations • But manually optimized traces score higher

  18. Sequence Space • Aligned protein sequences represented as vectors in a high-dimensional space • Each amino acid type at each column of the MSA is a unique point in Sequence Space • Dimension reduction by Principal Components Analysis • Cluster proteins • Based on their sequence identity • Map residues in the same space • Direction points to association with protein group

  19. A 3D object

  20. PCA projection of the 3D object New axes are linear combination of original axes

  21. Coding of amino acids

  22. Sequence vector representation

  23. Interpretation 1st axis represents the whole family 2nd, 3rd , …, 6th axes represent subclassifications Subfamily-specific residues are found at the tips of a polygon Common residues shared by several subfamilies are found along the edges of a polygon Many unspecific residues at origin

  24. Protein clustering

  25. Residue clustering

  26. Selection of residues & proteins

  27. Ortologit ja paralogit Malliorganismien käyttö: identtinen fysiologia?

  28. Summary • Functional groupings of proteins • Phylogenetic lineage • Orthologs / paralogs • Clustering by general sequence similarity • Residues associated with above groupings • Intra-group conservation • Inter-group variation • Neutral residues behave randomly

  29. Function = interactions • Protein-protein interactions • Co-evolution of interacting proteins • Comparative genomics

  30. Experimental methods • Y2H = yeast-two-hybrid • Ex vivo, binary interactions • Interaction must occur in the nucleus • Autoactivation (5-10 % of random ORFs) • Posttranslational modifications • AP/MS = affinity purification / Mass Spectrometry • Purified complexes • PChips = protein microarrays • In vitro • Covalent attachment to solid support • Screening with fluorescently labelled probes (e.g. proteins or lipids)

  31. Small part of an interaction network

  32. NewScientist, 13. April 2002, David Cohen about the work by Barabasi, Albert et al.

  33. Interaktioiden ennustaminen • ko-evoluutio • genomien vertailu • geenien järjestys kromosomissa • fylogeneettiset profiilit • geenifuusio

  34. Ko-evoluutio • monen sekvenssin linjaus, etsi korreloivat mutaatiot • proteiinit, joilla on paljon interaktioita, muuttuvat hitaammin • kaksi fylogeniapuuta, etsi parit

  35. Comparative genomics • Correlated genomic context between orthologous genes reveal functional couplings • Conserved gene order (conserved synteny) • Coupled gene loss / preservation (phylogenetic profiles) • Gene fusion events

  36. Conserved synteny • Chromosomal rearrangements randomize gene order over the course of evolution • Groups of genes that have a similar biological function tend to remain localized in a group or cluster • Bacterial operons allow coordinated regulation of gene expression from a common promoter • Eukaryotic clusters observed, too

  37. Phylogenetic profiling p1 p4 p5 p1 p2 p3 p5 p6 p8 yeast H. influenzae ye hi ec P7 0 0 1 P4 0 1 1 P6 1 0 0 P8 1 0 0 P2 1 0 1 P3 1 0 1 P1 1 1 0 P5 1 1 1 ye hi ec P1 1 1 0 P2 1 0 1 P3 1 0 1 P4 0 1 1 P5 1 1 1 P6 1 0 0 P7 0 0 1 P8 1 0 0 p2 p3 p4 p5 p7 E. coli

  38. Observations - phyloprofiles • Bit-vectors sensitive to noise in gene status assignment • Specific patterns generated mainly from bacterial gene loss / horizontal transfer • Eukaryotic species have larger genomes and large numbers of eukaryote-specific protein families

  39. Gene fusion Domain swapping

  40. Some details • 6,809 interactions predicted for E. coli based on gene fusions • 321 (~5 %) overlap with predictios by phylogenetic profile method • Eight times more than random • Promiscuous modules (SH2, SH3, etc.) • 5 % of domains made more than 25 links to other proteins • Fusions counted within remaining set of 95 %

  41. Observations – gene fusion • Marcotte et al. (Science 285:751-753, 1999) predicted novel interactions for 50 % of yeast proteins using gene fusion information in any homologous proteins • Enright et al. (Nature 402:86-90, 1999) considered orthologs with higher signal-to-noise ratio but only 7 % coverage

  42. Integrated predictions • Predictions by conserved synteny, phylogenetic profiles and gene fusion are largely additive • small overlap • Combined score • Calibrated against same / different KEGG map • STRING server • Predictions for about 50 % of genes from complete genomes • http://www.bork.embl-heidelberg.de/STRING/

More Related