1 / 47

The impact of whole genome duplications: insights from Paramecium tetraurelia

The impact of whole genome duplications: insights from Paramecium tetraurelia. Ab initio gene predictions Comparative approach 90,000 ESTs. Genome Annotation. Protein-coding regions: 78% of the genome Short intergenic regions Average = 352 bp Introns: Short (average = 25 bp) …

lavada
Download Presentation

The impact of whole genome duplications: insights from Paramecium tetraurelia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The impact of whole genome duplications: insights from Paramecium tetraurelia

  2. Ab initio gene predictions Comparative approach 90,000 ESTs Genome Annotation

  3. Protein-coding regions: 78% of the genome Short intergenic regions Average = 352 bp Introns: Short (average = 25 bp) … … but numerous : 80% of genes contain introns (average = 2.9 introns / gene) A compact Mac genome

  4. 20600 11000 C. elegans C. intestinalis Gene content Not due to annotation artefacts (control with cDNA data, distribution of protein length, manual curation on chrom. 1, …) 39642 annotated genes 45000 39642 40000 40000 37500 35000 30000 28000 27900 26900 Number of genes 25000 24000 24000 20000 15000 14000 12500 11200 10000 10000 9000 6000 5200 5000 2000 0 T. brucei O. sativa N. crassa A. thaliana H. sapiens E. cuniculi X. tropicalis T. nigrovidis S. cerevisiae M. musculus P. tetraurelia P. falciparum D. discoideum T. pseudonana D. melanogaster

  5. Computing Best Reciprocal Hits (BRH) within Paramecium proteins SW comparisons + filtering 13 085 pairs of proteins in BRH Many genes belong to multigenic families 39 642 proteins

  6. BRH are found in large duplicated blocs (paralogons). Example: scaffold 1 & 8

  7. Building paralogons • Using a sliding window of size w genes • For each window : • Select a paralogous region if at least p % of w genes are BRH with the sequence • Merging overlapping windows • Add syntenic genes which do not have BRH

  8. Whole genome duplication (WGD) Settings : W = 10 p = 61% Coverage : 61.3 Mb (85%) 35 503 genes (90%) Résults : 24 052 genes in 2 copies (68%) 11 451 genes in 1 copie (32%) 51% of ancestral genes are still in 2 copies

  9. ~1500 recent pseudogenes (recognizable) Length distribution of genic and intergenic sequences : relics of more ancient pseudogenes in intergenic regions Progressive loss of gene duplicates Single-copy gene Intergenic region encompassing a gene loss Other intergenic regions Frequency (%) Sequence length (bp)

  10. BRH from supercontig 8 Number of BRH (>3000) remains outside of paralogons

  11. Inferring ancestral blocs Paralogous genes Arbitrary order Ancestral blocs Building paralogons with 131 ancestral blocs

  12. Intermediary WGD Settings : W = 10 p = 40% Coverage : 31,129 genes (79%) Content before WGD : 20,578 genes 7 996 genes in 2 copies (39%) 12 582 genes in 1 copy (61%)

  13. Old WGD Settings : W = 20 p = 30% Coverage : 18,792 genes (47%) Content before WGD : 9,999 genes 1 530 genes in 2 copies (15%) 8 469 genes in 1 copy (85%)

  14. Gene content at each WGD 19 552 genes Old WGD 21 172 x 1.1 x 2 (not x 8) 26 214 Intermediary WGD x 1.2 39 642 Recent WGD x 1.5

  15. Protein sequence similarity between duplicates (ohnologs) Recent WGD Intermediary WGD Old WGD

  16. saturation Recent gene conversion Distribution of the rate of synonymous substitution (dS) between ohnologs Old WGD Intermediary WGD Recent WGD dS computed with PAML

  17. => both ohnologs are under strong negative selective pressure Yet … the fate of most ohnologs is to be pseudogenized ! => gene-silencing mutations can be tolerated … … but deleterious mutations affecting the coding sequence of one copy are counterselected (i.e. dominant effect of mutations, despite the presence of a duplicate) Once a gene has been silenced (e.g. by mutation of regulatory elements), mutations can accumulate in coding regions Recent WGD Frequency (%) dN/dS Distribution of dN/dS

  18. ... Pseudogene Selective pressure to maintain 2 copies Ancient paralogs Gene duplicates are evolutionarily unstable Gene duplication Time

  19. Different (non-exclusive) models have been proposed for the retention of gene duplicates: Robustness against mutations Functional changes: neo- or sub-functionalization Dosage constraints Which are the genes that are preferentially retained after a WGD ? How does the pattern of gene retention vary with time ? Compare the pattern of retention after a recent WGD and a more ancient WGD Paramecium: 3 successive WGDs ! Retention of gene duplicates

  20. Under certain conditions (high mutation rate and very large population size) redundant genes may be maintained by selection acting against double null alleles (Force et al. 1999) Essential genes (e.g. ribosomal proteins) are more retained than the average … but most of them are present in more than 2 copies ! … their high rate of retention may be due to other factors (see later) Mutational robustness

  21. Function: F1F2 Function: F Time ... ... Function: F2 Function: F’ Function: F1 Function: F Subfunctionalization (neutral evolution) Neofunctionalization (adaptation) Functional changes Functional changes: - changes in gene expression pattern - changes in the encoded protein Force et al. (1999)

  22. A gene that has been preserved by subfunctionalization at a given WGD, is less likely to be retained in two copies at a subsequent WGD (Force et al. 1999) F1F2 WGD1 WGD2 WGD1 WGD2 F1F2 F2 F1 F1F2 F2 F1 F2 F1 Prediction of the subfunctionalization model

  23. Apparent contradiction with the subfunctionalization model Due to variations in retention rate between different functional classes ? Test of the subfunctionalization model (1) Intermediate WGD N=7,996 N=12,582 Retention at the recent WGD ? Retained: 57% Retained: 47%

  24. A gene that has been preserved at a given WGD, is less likely to be retained in two copies at a subsequent WGD Difference significant (p<5%), but not very strong Subfunctionalization is an unlikely evolutionary pathway in species with large population sizes (Lynch 2005) Retained: 60% Retained: 67% Test of the subfunctionalization model (2) Old WGD Intermediate WGD N = 343 gene families Retention at the recent WGD ?

  25. Analysis of gene expression (work in progress) Analysis of the rate of protein evolution: Outgroup (function F) Ohnolog 1 (function F) Ohnolog 2 (function F’) Test of the neofunctionalization model • Relative rate test (PAML); correction for multiple tests • Frequency of ohnologs with asymetric substitution rates: • Recent WGD (N=2297) : 11% • Intermediate WGD (N=293 ) : 16% • More functional redundancy among recent duplicates • Functional changes account for retention on the long term

  26. Fate of neofunctionalized genes at subsequent WGD Intermediate WGD N = 62 Retention at the recent WGD ? Fast copy: 26% retained Slow copy: 66% retained Neofunctionalized genes are more prone to pseudogenization at subsequent WGD

  27. Genes that have to be expressed at very high level are often present in multiple copies (e.g. histones) The loss of one copy is counterselected because it cannot be compensated for by the upregulation of other copies => More retention among highly expressed genes Retention for dosage constraints (1): high expression level

  28. Retention rates For each WGD, the retention rate for a given gene category is : Proportion of genes retained in duplicates in this category Ratio = Proportion of total genes retained in duplicates Ratio = 1 no specific retention above the mean value for all genes Ratio > 1 over-retained category Ratio < 1 under-retained category

  29. Expression versus Retention

  30. The relative expression levels of proteins involved in a same functional network have to be controled to ensure the proper stoichiometry of the network Initially, the loss of one copy is counterselected because it creates an imbalance within the network On the long term, gene losses may occur because they can be compensated for by the upregulation of other copies Retention for dosage constraints (2): the balance hypothesis (Papp et al. 2003)

  31. Protein complexes predicted by homology with yeast: MIPS database (curation from the litterature) TAP / MS data (Gavin et al. Nature 2006) Testing the balance hypothesis (1):Genes involved in multi-protein complexes

  32. Multi-protein complexes Genes involved in the coding of protein complexes are initially over-retained

  33. Additive effects of Expression and Inclusion in Complex

  34. Proteins involved in complexes are over-retained at the recent WGD Does this mean that complex stoichiometry tends to be conserved ?

  35. Complexes p-value with conserved stoichiometry Recent WGD 265 (44%) 2.6x10-2 74 (68%) 4.3x10-4 Intermediary WGD 114 (20%) 1.5x10-3 43 (43%) 2.4x10-4 Old WGD 106 (24%) 1.2x10-5 26 (43%) 2.5x10-3 MIPS complexes Complexes from Gavin et al. Nature 2006 Constraint of stoichiometry and fate of duplicates A B complex Number of copy of A Number of copy of B

  36. Testing the balance hypothesis (2): genes involved in central metabolism

  37. Retention of central metabolism geneduplicates Genes involved in the central metabolism are initially over-retained and then under-retained (less neofunctionalization ?)

  38. Phylogenetic analyses of orthologous genes in other ciliate species => date WGDs relative to speciation events Dating genome duplications

  39. Old WGD P. jenningsi P. sexaurelia Complex aurelia: 15 sibling species (same kind of habitat, initially thought to correspond to a single species) P. pentaurelia P. novaurelia P. primaurelia P. octaurelia P. quadecaurelia P. tredecaurelia P. tetraurelia Recent WGD Tetrahymena thermophila P. bursaria P. putrinum P. duboscqui P. polycaryum P. nephridiatum P. caudatum P. multimicronucleatum Paramecium aurelia complex Intermediate WGD

  40. How does WGD relate to speciation?

  41. Polyploid paramecia Ptetra Pprim With the kind permission of K. Wolfe

  42. Polyploid paramecia Ptetra Pprim Mating, meiosis

  43. Dobzhansky-Muller incompatibility by reciprocal gene loss For 1 locus, 1/4 of the offspring is inviable. For n loci, offspring viability is (3/4)n • Reproductive isolation

  44. At least 3 WGDs in paramecium (probably 4) WGDs are rare events … that occured recurrently in the evolution of eukaryotes (fungi, animals, plants, ciliates …) Major impact on the evolution of the gene repertoire Conclusions (1)

  45. Dosage constraints appear as an essential force shaping the gene repertoire after WGD Functional changes contribute to gene retention on the long term … … but the fate of the vast majority of genes is to get pseudogenized Conclusions (2)

  46. Relationship between the number of genes and organism complexity The number of genes is driven by selection … … and contingency (time since the last WGD) WGDs may be reponsible for (non-adaptative) explosive radiation of species (Dobzhansky-Muller incompatibility by reciprocal gene loss) Conclusions (3)

  47. CNRS-UPR2167 - CGM - Gif sur Yvette Jean Cohen Linda Sperling CNRS-UMR8541 – ENS - Paris Eric Meyer Mireille Bétermier CNRS-UMR8125 – IGR - Villejuif Philippe Dessen CNRS-UMR5558 – PBIL - Lyon Laurent Duret Vincent Daubin Genoscope - CNRS UMR 8030 Jean-Marc Aury Olivier Jaillon Benjamin Noel Betina Porcel Vincent Schachter Patrick Wincker Jean Weissenbach

More Related