1 / 49

Trees and Forests: The Ascomycota yeast as a paradigm for comparative genomics

Trees and Forests: The Ascomycota yeast as a paradigm for comparative genomics. Aviv Regev Bauer Center for Genomics Research Harvard University. Comparative Genomics. Gain better understanding of a model genome by comparing to others Reconstruct and study the evolutionary process

yuki
Download Presentation

Trees and Forests: The Ascomycota yeast as a paradigm for comparative genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trees and Forests:The Ascomycota yeast as a paradigm for comparative genomics Aviv Regev Bauer Center for Genomics Research Harvard University

  2. Comparative Genomics • Gain better understanding of a model genome by comparing to others • Reconstruct and study the evolutionary process • Focus on conservation or divergence?

  3. cerevisiae paradoxus mikatae bayanus glabrata castellii lactis gossypii waltii hansenii albicans lipolytica crassa graminearum grisea nidulans pombe Ascomycota Fungi

  4. cerevisiae paradoxus mikatae bayanus glabrata castellii lactis gossypii waltii hansenii albicans lipolytica crassa graminearum grisea nidulans pombe Saccharomyces sensu stricto • 5-20 million years • Sufficient conservation to align • Sufficient divergence to identify conserved functional elements ~5M ~20M

  5. X OG 2 Genome alignment • Goal: Matching genes (orthologs) unambigously • Orthologs: two genes in two species that trace their ancestry to the same gene in the last common ancestor • Paralogs: two genes in the same species that trace their ancestry to the same gene duplication A X B Y C species tree gene tree genes

  6. Finding unambiguous orthologs • Step1: bi-partite gene similarity graph • Step2: Eliminate all edges that are less than 80% of the maximum-weight edge Species 2 Species 1 Filter S. cerevisiae S. cerevisiae Weighted by % similairty and alignment length S. paradoxus S. paradoxus

  7. Finding unambiguous orthologs • Step 3: Use any unambigous (one-to-one) matches to build blocks of conserved gene order (synteny blocks) • Use to resolve additional ambiguities by keeping matches within blocks S. cerevisiae S. paradoxus

  8. Finding unambigous orthologs • Step 4: Find Best Unambiguous Subset • all best matches of any gene within the set are contained within the set • no best match of a gene outside the set is contained within the set S. cerevisiae S. paradoxus

  9. Large scale genome evolution • Most ORFs have a clear match • Dense anchoring landscape • Clear blocks of synteny

  10. Large scale genome evolution • Rapid structural evolution of telomeres • Non-telomeric inversions flanked by tRNA sequences • Reciprocal translocations occur between Ty elements

  11. Nucleotide level alignment • Variation in in genic and intergenic regions

  12. Nucleotide level alignment Can this signal be used for the identification of functional elements?

  13. Gene identification • Idea: True protein coding ORFs (but not spurious ones) will be under strong selection to preserve the open reading frame Rejected gene

  14. The RFC score • RFC - the (average) percent of nucleotides that are in the same frame within overlapping windows of the alignment.

  15. The RFC score • Bimodality allows to define species specific threshold for rejection • Each species votes, and votes are tallied for decision • Tested with intergenic regions

  16. An improved gene catalog • Gene validation and rejection • Novel gene identification • Gene merger • Gene boundary re-definition • Intron identification

  17. Regulatory element identification: GAL4 • GAL4 site: CGGn(11)CCG • Intergenic GAL4 occurrences have a 5-fold higher conservation rate than for equivalent random motifs • GAL4 has 11-fold higher relative conservation rate in intergenic compared with genic regionsthan for equivalent random motifs • Intergenic GAL4 shows a higher conservation rate in divergent compared with convergent intergenic regions

  18. Conservation Criteria (CC) • CC1: intergenic conservation: a motif shows a significantly high conservation rate in intergenic regions

  19. Conservation Criteria (CC) • CC2: intergenic–genic conservation: a motif shows significantly higher conservation in intergenic regions than in genic regions

  20. Conservation Criteria (CC) • CC3: upstream–downstream conservation: a motif shows significantly different conservation rates when it occurs upstream compared with downstream of a gene.

  21. A motif discovery pipeline

  22. A motif discovery pipeline 72 full motifs (42 novel)

  23. Assigning motif functionality • GAL4 site: CGGn(11)CCG • 126 carbohydrate metabolism genes • Only 2% of all intergenic regions, • but 7% of the occurrences of the Gal4 motif in S. cerevisiae (3.5-fold enrichment) • and 29% of the conserved occurrences across the four species (15-fold enrichment).

  24. Assigning motif functionality • Idea: Characterize motif by function of adjacent targets of conserved occurences (targets) • Compared to various gene sets (GO, expression, ChIP) • CC4: enriched in the intergenic regions of genes in the category • Combinatorial control: conserved co-occurrence

  25. Conclusions this part • Identifying functional elements in a model genome • Improved gene catalog and gene boundaries • Regulatory element identification

  26. cerevisiae paradoxus mikatae bayanus glabrata castellii lactis gossypii waltii hansenii albicans lipolytica crassa graminearum grisea nidulans pombe Hypothesis (1997): Whole Genome Duplication (WGD) ? ~100M

  27. Hypothetical resolution of WGD • A 1:2 mapping where • nearly every region in species Y would correspond to two sister regions in S. cerevisiae • the two sister regions in S. cerevisiae would contain ordered interleaving subsequences of the genes in the corresponding region of species Y • nearly every region of S. cerevisiae would correspond to one region of species Y, and thus be paired to a sister region in S. cerevisiae

  28. Aligning the S. cerevisiae and K. waltii genomes • Use the BUS algorithm to identify orthologous regions. • Most regions in K. waltii mapped to two regions in S. cerevisiae with each containing matches to only a subset of the K. waltii genes

  29. Double Conserved Synteny (DCS) blocks • DCS: maximal regions in K. waltii that map across their entire length to two distinct regions in S. cerevisiae • 253 DCS blocks, containing 75% of K. waltii genes and 81% of S. cerevisiae genes

  30. Duplication covers the whole S. cerevisiae genome

  31. What happens to genes post WGD? • 12% of paralogous gene pairs were retained • Alternative hypotheses • Ohno: one paralog maintains ancestral function, the other diverges • both would diverge more rapidly than non-duplicated genes

  32. Accelerated paralog evolution • 76 of the 457 gene pairs (17%) show accelerated protein evolution • In 95% of those accelerated evolution was confined to only one of the two paralogues

  33. Derived paralogs • Protein kinases, Tx regulators, metabolism • Specialized in their cellular localization or temporal expression. • Neverlethal in rich medium (the “ancestral” paralogue was lethal in 18% of cases)

  34. Decelerated paralog evolution • 60 of the 457 pairs show decelerated protein evolution • ribosomal proteins (25 pairs), histone proteins (2 pairs) and translation initiation/elongation factors (4 pairs). • The paralogs tend to be very similar (98% amino acid identity versus 55% for all pairs), suggesting periodic gene conversion

  35. Conclusion • Studying evolution with comparative genomics: Proving the WGD event • Duplication “allows” one paralog to evolve fast following gene duplication, sometimes “releasing” a highly conserved function • How was the WGD resolved?

  36. How was the WGD resolved? cerevisiae paradoxus mikatae bayanus glabrata ~100M castellii lactis gossypii waltii hansenii albicans lipolytica crassa graminearum grisea nidulans pombe

  37. YGOB: Yeast Gene Order Browser • Give YGOB as resource Ancestral locus http://wolfe.gen.tcd.ie/ygob/

  38. Gene loss What happened to ancestral loci?

  39. Reciprocal gene loss • Reciprocal gene loss: when opposite members of a gene pair are lost in two daughter species • 176 of the 2,723 loci (6.4%) between S. cerevisiae and S. castellii • 198 (7.3%) between C. glabrata and S. castellii • 100 (3.7%) between S. cerevisiae and C. glabrata

  40. Reciprocal gene loss and speciation • Bateson–Dobzhansky–Muller (BDM) model: Reciprocal gene loss promotes interspecific genomic incompatibility • For a single locus reciprocally with fitness contribution lost between species – a quarter of spores will have reduced viability • The large number of reciprocal losses observed among the post-WGD species is ample to account for their reproductive isolation (46 encode essential genes).

  41. Reciprocal gene loss and speciation • A precipitous loss of duplicated genes in the time interval between WGD and the first speciation event • The vast majority of reciprocal losses must have occurred at around the time of the two speciation events.

  42. Which copy is lost? • Hypothesis: The two copies are equivalent; the ‘choice’ of which copy to delete was arbitrary. • Test: ancestral loci that have been resolved independently in more than one post-WGD lineage. The two retained genes are more often orthologues than paralogues. Why?

  43. Which copy is lost? • Hypothesis: at some loci the two copies were not functionally identical at the time of duplication, and the same (“better functioning”) copy was retained on both occasions • Test: neutral gene loss is expected to be more frequent at ancestral loci that are slowly evolving or involved in highly conserved biological processes • Result: Loci in Class 3 (all reciprocally lost) evolve on average 30% slower than Class 4 (no reciprocal gene loss), and enriched in ribosome biogenesis, RNA binding, and nucleolar genes. • Further increases the potential contribution of reciprocal gene-loss loci to reproductive isolation

  44. Conclusion: Reciprocal gene loss and speciation

  45. Further reading • Comparative genomics of Ascomycota fungi • Dujon B, et al Genome evolution in yeasts. Nature. 2004 Jul 1;430(6995):35-44. • Galagan JE, et al Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature. 2005 Dec 22;438(7071):1105-15. • Dean RA, et al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature. 2005 Apr 21;434(7036):980-6. • Galagan JE, et al. The genome sequence of the filamentous fungus Neurospora crassa. Nature. 2003 Apr 24;422(6934):859-68. • Yeast Whole Genome Duplication • Wolfe, K. H. (2006). Comparative genomics and genome evolution in yeasts. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 361, 403-412. • Fischer G, Rocha EP, Brunet F, Vergassola M, Dujon B. Highly variable rates of genome rearrangements between hemiascomycetous yeast lineages. PLoS Genet. 2006 Mar;2(3):e32. • Byrne KP, Wolfe KH. The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species Genome Res. 2005 Oct;15(10):1456-61. • Conant, G. C. & Wolfe, K. H. (2006). Functional partitioning of yeast co-expression networks after genome duplication. PLoS Biol. 4, e109. • Comparative analysis and motif finding in yeasts • Tanay A, Gat-Viks I, Shamir R. A global view of the selection forces in the evolution of yeast cis-regulation. Genome Res. 2004 May;14(5):829-34. • Siddharthan R, Siggia ED, van Nimwegen E. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol. 2005 Dec;1(7):e67.

  46. Further reading • Vertebrate comparative genomics (tiny sample) • Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature. 2005 Mar 17;434(7031):338-45. • Jaillon O, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004 Oct 21;431(7011):946-57. • Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006 May 4;441(7089):87-90. • Bejerano G, Siepel AC, Kent WJ, Haussler D. Computational screening of conserved genomic DNA in search of functional noncoding elements. Nat Methods. 2005 Jul;2(7):535-45. • Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. • Bejerano G, Haussler D, Blanchette M. Into the heart of darkness: large-scale clustering of human non-coding DNA. Bioinformatics. 2004 Aug 4;20 Suppl 1:I40-I48. • Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D. Ultraconserved elements in the human genome. Science. 2004 May 28;304(5675):1321-5.

More Related