390 likes | 593 Views
Functional Linkages between Proteins. E. Coli. S. cerevisiae. Drosophila. Introduction. Piles of Information. Flakes of Knowledge.
E N D
E. Coli S. cerevisiae Drosophila Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGACTCACGATGTGACTGCATGCGTCATTATCTAGTATGAAAAAAGCCATGCTAGGCTAGTCAGCGACATGAGCCATGACTAGCGCAGCATCAGTCATCAGTCAGCGGAGCGAGGAGAGAGAGACGACTGACTAGCATGCACACATGCATGACGTCATGACTGCATGACTGACTGACTGACTGCATGCATGATATTTTTTTTTTCATGCATGCAGCATGCTACCCAGCTACAGTGCACAGCAGGTACGACGCATCAGCATACGTACGGCATGACGACTCAGACTACGCATACGACTACGAC
Data Analysis • Traditional Methods (Experiments & Sequence Homology) The function of a protein • New Computational Methods Functional linkages between proteins
What does Functional Linkage mean ? • A common structural complex • A common metabolic pathway • A common biological process 4)All answers are correct
New Computational Methods • Phylogenetic Profile Method • Rosetta Stone Method • Chromosomal Proximity Method • COG Database
1 1 1 Phylogenetic Profile Method
Phylogenetic Profile Method Why Should it Work ? • Biologically: Simliar profile likelihood for common pathway or complex • Mathematically: N genomes 2N possible profiles A unique characterization
Rosetta Stone Method (= Domain Fusion Analysis) • Interacting proteins have homologs in another organism fused into a single protein chain
Rosestta Stone Method Why Should it Work ? • Experimentally: E. coli ~4300 proteins ~6800 pairs similar to a single protein • Biologically:
Rosestta Stone Method Validation Tests(E. coli): • Annotation of proteins from the SWISS-PROT database (68% vs. 15%) • Database of Interacting Proteins (6.4%) • Phylogenetic Profile Method (5% vs. 0.6%)
Models’ Success & Failure found predicted
Rosestta Stone Method False Negatives 1) interactions that have evolved through other mechanisms, i.e. there never was a fusion 2) The fused protein has disppeared during evolution
Rosestta Stone Method False Positives 1) Proteins have been fused to regulate co-expression 2) Can’t distinguish between binding and non-binding homologs. 3) Functional interaction rather than a physical interaction
Rosestta Stone Method • Reducing Errors
Rosestta Stone Method • Reconstruction of metabolic pathways
Orthologs vs. Paralogs • Orthologs: genes in different species that evolved from a common ancestral gene by speciation • Paralogs: genes related by duplication within a genome
Chromosomal Proximity Proximate Genes • On the same strand • Within 300 bp, or - • Respective paralogs within 300 bp Inferred link • genes whose orthologs are close in at least three phylogenetic groups
Chromosomal Proximity • Direct Link two proximate genes that are also proximate in at least two other phylogenetic groups • Indirect Link genes whose orthologs are close in at least three other phylogenetic groups
Chromosomal Proximity Why Should it Work ? • Biologically: Conservation of proximity across multiple genomes Linked function • Logically: How likely is it that two genes are randomly proximate ?
Chromosomal Proximity Method’s Reliability:
Chromosomal Proximity Validation: • 1586 links were detected between ortholog families • KEGG: 80% in the same biological pathway • COG: 67% in the same functional category
Chromosomal Proximity • Total validated links per genome 380 direct 352 inferred
The COG Database • Clusters of Orthologous Groups • COGs creation • Each COG contains proteins that have evolved from an ancestral protein
The COG Database Current Numbers (2004) • 43 Complete genomes • 30 phylogenetic groups • 2223 phylogenetic patterns • 17 functional categories • 3307 COGS • 74059 proteins, 71% of total
The COG Database How can we use it ? Direct Information • Annotation of Proteins (group and individual) • Phylogenetic Patterns • Multiple Alignment
The COG Database How can we use it ? Detecting Missed Genes • Patterns that contain all but one • Mostly small proteins
The COG Database • Groups number growth • Are we approaching saturation ?
Reliability of the Methods • Major validation: Experimentally known linkages • Validation by “keyword recovery” search
references • Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. Protein function in the post-genomic era. Nature. 2000 405:823-826. Review • Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and proteing protein interactions from genome sequences. Science. 1999 285:751-753. • Yanai I, Mellor JC, DeLisi C. Identifying functional links between genes using conserved chromosomal proximity. Trends Genet. 2002 18:176-179. • Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorove ND, Koonin EV. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001 29:22-28. • Tatusov,R.L., Koonin,E.V. and Lipman,D.J. (1997) A genomic perspective on protein families. Science, 278, 631–637. • http://www.ncbi.nlm.nih.gov/COG