1 / 43

(bacterial) Genome Evolution: added value of genomes

Bioinformatics and Evolutionary Genomics Genome Evolution ( I ) and Genomics Context for function prediction. (bacterial) Genome Evolution: added value of genomes. How does gene content evolve? How does gene order evolve?

mercury
Download Presentation

(bacterial) Genome Evolution: added value of genomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics and Evolutionary GenomicsGenome Evolution (I) and Genomics Context for function prediction

  2. (bacterial) Genome Evolution: added value of genomes • How does gene content evolve? • How does gene order evolve? • How important are various evolutionary dynamics of genes on a genomic scale (e.g. gene fusion, gene loss, gene duplication): moving from anecdotes to trends

  3. Genomic context / in silico interaction prediction functionally associated proteins leave evolutionary traces of their relation in genomes

  4. Gene order evolution: -Establish orthologous relations between pairs of genomes (e.g. S-W best bidirectional hit approach -Put them in a dotplot, color the relative direction of transcription (Green for the same relative direction. Red for the opposite direction.)

  5. Evolution of genome organization: • In prokaryotes, genome inversions centered around the origin/terminus of replication are a major source of genome rearrangements. • This suggests that both replication forks are in close contact -> comparative genome analysis provides support for a hypothesis about genome replication • “and a close proximity of the forks would increase the • probability of reciprocal recombination or transposition between sequences at the two forks. That the forks are near each other is also consistent with the 'replication factory' model based on immunolocalization of components of the replication machinery in Bacillus subtilis” (Tillier and Collins, 2000. Nat. Gen)” b

  6. Gene order evolves rapidly But …

  7. Gene Order Evolution Differential retention of divergent / convergent gene pairs suggests that conservation implies a functional association Operons

  8. Conserved gene order • i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene cluster • Contributes many reliable predictions

  9. Conserved gene order NB1 predicting operons is not trivial; in fact conserved gene order or functional association is a major clue NB2 using ‘only’ operons without requiring conservation results in much less reliable function prediction

  10. Comparison to pathways conservation implies a functional association

  11. Conserved gene order: an example from metabolism of propionyl-CoA “target” “query”

  12. Conserved gene order: an example from metabolism of propionyl-CoA Biochemical assays confirm the function of members of COG0346 as a DL-methylmalonyl-CoA racemase

  13. Gene Fusion • “Rare” (especially in prokaryotes): ~3000 linked COGs in STRING v6 (~180 genomes) • But what about domain recombination? Fusion

  14. Gene fusion • i.e. the orthologs of two genes in another organism are fused into one polypeptide • A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event:

  15. Gene fusion: an example

  16. Gene Content Evolution

  17. What about HGT?Genome trees based on gene content: shared genes Haemophilus influenzae Escherichia coli Species specific genes Species specific genes

  18. ( ) # shared OGs (spA, spB) dist (spA, spB) = 1 – Weighted average Genome size(spA, spB) \s sp1 sp2 sp3 sp4 … sp1 \1 0.2 0.4 0.2 … sp2 \1 0.9 0.1 … sp3 \1 0.3 … sp4 \1 … … … … … … d 0 0.8 0 0.60.10 0.8 0.9 0.70 Genome trees based on gene content OG1 OG2 OG3 OG4 … sp1 1 1 0 1 … sp2 0 1 0 0 … sp3 0 0 1 1 … … … … … … Presence / absence matrix: Neighbor joining

  19. Genome trees based on gene content are remarkably similar to consensus on ToL M. tuberculosis M. pneumoniae M. genitalium B. subtilis Spirochaetales 100 T. maritima C. pneumoniae T. pallidum C. trachomatis 100 88 B. burgdorferi 98 100 69 A. aeolicus 93 Synechocystis sp. 100 89 100 100 100 A. pernix E. coli 97 100 100 H. influenzae P. horikoshii R. prowazekii 100 M. thermoautotrophicum H. pylori J99 H. pylori 26695 M. jannaschii A. fulgidus Proteobacteria Euryarchaeota 0.1 S. cerevisiae C. elegans Eukarya

  20. Reconstruction of Gene Content b

  21. Ancestral Genome Reconstruction of LUCA : patchy gene distributions Deletion Gain

  22. Parsimony • Attach a cost (c) to HGT / independent gain in terms of loss; find scenario with lowest cost • At g = 1.5, 733 genes in LUCA • At g = 2, 956 genes in LUCA • Evolution is not parsimonious, minimal estimate? • Why not use gene trees? b

  23. Nice results e.g. nucleotide biosynthesis

  24. Another attempt to reconstruct the genome of LUCA • over 1000 gene families, of which more than 90% are also functionally characterized. • a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation,

  25. Presence / absence of genes Gene content  co-evolution. (The easy case, few genomes. ) Differences between gene Content reflect differences in Phenotypic potentialities Genomes share genes for phenotypes they have in common

  26. Qualitative differential genome analysis: • Find “pathogen specific” specific proteins that can serve as drug targets • Relate the differences between genomes to the differences in the phenotypes

  27. Three-way comparisons Huynen et al., 1998, FEBS Lett

  28. Convergence in functional classes of gene content in small intra cellular bacterial parasites Zomorodipour & Andersson FEBS Letters 1999

  29. Although we can, qualitatively, interpret the variations in shared gene content in terms of the phenotypes of the species, quantitatively they depend on the relative phylogenetic positions of the species. The closer two species are the larger fraction of their genes they share.

  30. Presence / absence of genes L. innocua (non-pathogen) L. monocytogenes (pathogen)

  31. Occurrence of genes Genes involved in pathogenecity L. monocytogenes (pathogenic) L. innocua (non-pathogenic)

  32. species 1 species 2 species 3 species 4 species 5 ...... ... .. .. Generalization: phylogenetic profiles / co-occurence Gene 1: Gene 2: Gene 3: .... species 1 species 2 species 3 species 4 species 5 ...... ... .. .. Gene 1: 1 0 1 1 0 1 Gene 2: 1 1 0 0 1 0 Gene 3: 0 1 0 0 1 0 ....

  33. Co-occurrence of genes across genomes • i.e. two genes have the same presence/ absence pattern over multiple genomes: • AKA phylogenetic profiles • NB complete genomes absence • Correction for phylogenetic signal needed → events b

  34. Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence of genes across genomes • Friedreich’s ataxia • No (homolog with) known function

  35. M C . . A j j A e a . t . j n H. pylori p h u n e a n a r l i i s n a c i n x h S a i S . i . p c C o e . r a m e l b C b v . e i i e s c l i a e a n g e s a n hscB Jac1 hscA ssq1 iscS Nfs1 iscU Isu1-2 iscA Isa1-2 fdx Yah1 RnaM IscR Hyp Atm1 Nfu1 Arh1 Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster assembly A . a e B o u l c i c h u n R s e . S r p y a r P D X H . n o N P . . . a . V e i r . f M E w e . B n a a c C m . m . r c a . f s . d h g M s u u l h z e t c i o M i coli e l u u g e r . n o o t d l c n i e . e i l b k o d i t o i n y e t s n i n t c o u t u i i r s c o g i z i a l s r t b a i e i s l d a i a a t s i e e a s n a e i n u r t d c s u i m H u s s l . s o D.melan. a s i p s i e n s s cyaY Yfh1

  36. Iron-Sulfur (2Fe-2S) cluster in the Rieske protein

  37. Prediction: ~Confirmation:

  38. Genomic context / in silico interaction prediction functionally associated proteins leave evolutionary traces of their relation in genomes

  39. Evolutionary rate Chen &Dokholyan TiG 2006

  40. Co-evolution: mirrortree Pavos & Valencia PEDS 2001

  41. Co-evolution: mirrortree

  42. 1 0.8 0.6 Fraction same KEGG map 0.4 Fusion Gene Order 0.2 Co-occurrence 0 0 0.2 0.4 0.6 0.8 1 Score Integrating genomic context scores into one single score (post-hoc) • Compare each individual method against an independent benchmark (KEGG), and find “equivalency” • Multiply the chances that two proteins are not interacting and subtract from 1; naive bayesian i.e. assuming independence

More Related