1 / 46

Nothing in ( computational ) biology makes sense except in the light of evolution

Comparative genomics, genome context and genome annotation. Nothing in ( computational ) biology makes sense except in the light of evolution. after Theodosius Dobzhansky (1970). Genome context analysis and genome annotation. Using information other than homologous relationships

Download Presentation

Nothing in ( computational ) biology makes sense except in the light of evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative genomics, genome context and genome annotation Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970)

  2. Genome context analysis and genome annotation Using information other than homologous relationships between individual gene/proteins for functional prediction (guilt by association) Types of context analysis: • phyletic patterns • domain fusion (“Rosetta Stone” proteins) • gene order conservation • co-expression • ….

  3. Goals: • Using gene sets from complete genomes, delineate families of orthologs and paralogs - Clusters of Orthologous Groups (of genes) (COGs) • Using COGs, develop an engine • for functional annotation of new • genomes • Apply COGs for analysis of phylogenetic patterns

  4. COG: - group of homologous proteins such that all proteins from different species are orthologs (all proteins from the same species in a COG are paralogs)

  5. CONSTRUCTION OF COGs FOR 8 COMPLETE GENOMES Complete set of proteins from the analyzed genomes Merge triangles with common edges 1 6 FULL SELF-COMPARISON (BLASTPGP, no cut-off) Detect groups with multidomain proteins and isolate domains 2 5 Collapse obvious paralogs 3 REPEATSTEPS 3-5 Detect all interspecies Best Hits (BeTs) between individual proteins or groups of paralogs 4 COGs Detect all triangles of consistent BeTs

  6. A TRIANGLE OF BeTs IS A MINIMAL, ELEMENTARY COG

  7. A RELATIVELY SIMPLE COG PRODUCED BY MERGING ADJACENT TRIANGLES

  8. A COMPLEX COG WITH MULTIPLE PARALOGS

  9. Current status of the COGs Prokaryotes 11 Archaea + 1 unicellular eukaryote + 46 bacteria = 58 complete genomes 149,321 proteins 105,861 proteins in 4075 COGs (71%) Eukaryotes 4 animals + 1 plant + 2 fungi + 1 microsporidium = 8 complete genomes 142,498 proteins 74,093 proteins in 4822 COGs (52%)

  10. COGnitor...

  11. …IN ACTION

  12. The Universal COGs

  13. Search for genomic determinants of hyperthermophily

  14. Search for unique archaeo-eukaryotic genes

  15. A complementary pattern: search for unique bacterial genes

  16. Essential function… but holes in the phyletic pattern Strict complementary pattern

  17. Relaxed complementary pattern

  18. Relaxed complementary pattern with extra restrictions

  19. 1 101 201 301 401 1 101 201 301 401 501 601 Conservation of gene order in bacterial species of the same genus M. genitalium vs M. pneumoniae

  20. 1 101 201 301 401 501 601 701 801 1 101 201 301 401 501 601 701 801 901 1001 Conservation of gene order in closely related bacterial genera C. trachomatis vs C. pneumoniae

  21. Lack of gene order conservation - even in “closely related” bacteria of the same Proteobacterial subdivision P. aeruginosa vs E. coli

  22. Genome Alignments - Method Protein sets from completely genomes BLAST cross-comparison Table of Hits Pairwise Genome Alignment Local alignment algorithm Lamarck (gap opening penalty, gap extension penalty); statistics with Monte Carlo simulations Template-Anchored Genome Alignment

  23. 0.5 cpneu-ctra mjan-mthe 0.4 bsub-ecoli drad-aero 0.3 0.2 0.1 0.0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 >20 Genome Alignments - Statistics Distribution of conserved gene string lengths

  24. Genome Alignments - Statistics Pairwise No. No. % in % in alignments: strings genes Gen1 Gen2 all homologs ecoli-hinf138 566 13% 33% ecoli-bsub 89 322 8% 8% ecoli-mjan 10 30 1% 2% probable orthologs ecoli-hinf105 482 11% 28% ecoli-bsub 34 168 4% 4% ecoli-mjan 12 33 1% 2%

  25. 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 cjej aful cac hinf tpal ctra hpyl pyro rpxx aero bbur drad uure tmar ecoli bsub mjan mthe mtub mgen nmen aquae cpneu mpneu synecho Not in gene strings In non-conserved gene strings (directons) In conserved gene strings Genome Alignments - Statistics Breakdown of genes in the genome

  26. Genome Alignments - Statistics Fraction of the genome in conserved gene strings - from template-anchored alignments MinimumSynechocystis sp. 5% Aquifex aeolicus10% Archaeoglobus fulgidus13% Escherichia coli14% Treponema pallidum17% MaximumThermotoga maritima 23% Mycoplasma genitalium 24%

  27. Context-Based Prediction of Protein Functions A Novel Translation Factor (COG0536) L21 L27 GTPase? GTP-binding translation factor

  28. Context-Based Prediction of Protein Functions A Novel Translation Factor (COG0012) TGS domain containing GTPase? Peptidyl-tRNA hydrolase GTP-binding translation factor

More Related