1 / 55

Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?

Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?. Laurent Duret, Nicolas Galtier, Peter Arndt. ACI-IMPBIO 4-5 octobre 2007. What’s in our genome ?. 3.1 10 9 bp Repeated sequences: ~50% 20,000-25,000 protein-coding genes

apollo
Download Presentation

Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5 octobre 2007

  2. What’s in our genome ? • 3.1 109 bp • Repeated sequences: ~50% • 20,000-25,000 protein-coding genes • Protein-coding regions : 1.2% • Other functional elements in non-coding regions: 4-10%

  3. How to identify functional elements ?

  4. What make chimps different from us ? • What are the functional elements responsible for adaptative evolution ? 30 106 point substitutions + indels + duplications (copy number variations)

  5. Genome annotation by comparative genomics • Basic principle : • Functional element <=> constrained by natural selection • Detecting the hallmarks of selection in genomic sequences • Negative selection (conservation) • Positive selection (adaptation)

  6. DNA repair Individual no transmission to the offspring soma Mutation germline Substitution transmission to the offspring (polymorphism) Loss of the allele Population (N) Fixation Evolution : mutation, selection, drift Base modification, replication error, deletion, insertion, ... = premutation

  7. Evolution : mutation, selection, drift Probability of fixation: p = f(s, Ne) s : relative impact on fitness s = 0 : neutral mutation (random genetic drift) s < 0 : disadvantageous mutation = negative (purifying) selection s > 0 : advantageous mutation = positive(directional) selection Ne : effective population size: stochastic effects of gamete sampling are stronger in small populations |Nes| < 1 : effectively neutral mutation

  8. Individual Mutation Substitution Polymorphism Population (Ne) Fixation Demonstrate the action of selection = reject the predictions of the neutral model Base modification, replication error, deletion, insertion, etc. Substitution rate = f(mutation rate, fixation probability) |Nes| < 1 : substitution rate = mutation rate

  9. Tracking natural selection ... • Mutation rate: u • Substitution rate: K • Negative selection => K < u • Neutral evolution => K = u • Positive selection => K > u How to estimate u ? => Use of neutral markers

  10. Tracking natural selection ... • Synonymous substitution rate: Ks • Non-synonymous substitution rate: Ka • Hypothesis: synonymous sites evolve (nearly) neutraly • Ks ~ u • Negative selection => Ka < Ks • Neutral evolution => Ka = Ks • Positive selection => Ka > Ks

  11. Tracking natural selection ... is not so easy • Patterns of neutral substitution vary along chromosomes • Impact of molecular processes (replication, DNA-repair, transcription, recombination, …) • Genomic environment (susceptibility to mutagens)

  12. chromosome 19 100 kb 60 50 GC% 40 30 0 200 400 600 800 1000 kb Mammalian genomic landscapes • Large scale variations of base composition along chromosomes (isochores) chromosome 21 Sliding windows : 20 kb, step = 2 kb

  13. GC content variations affect both coding and non-coding regions 3661 human genes from 1652 large genomic sequences (> 50 kb; average = 134 kb). Total = 221 Mb (98% non-coding)

  14. What is the evolutionary process responsible for these large-scale variations in base composition ?

  15. Variation in mutation patterns ? • Analysis of polymorphism data: in GC-rich regions, AT->GC mutations have a higher probability of fixation than GC->AT mutations (Eyre-Walker 1999; Duret et al. 2002; Spencer et al. 2006)

  16. Selection ? • What could be the selective advantage confered by a single AT->GC mutations in a Mb-long genomic region ???

  17. Biased Gene Conversion ?

  18. Biased Gene Conversion (BGC) Molecular events of meiotic recombination T If DNA mismatch repair is biased (i.e. probability of repair is not 50% in favor of each base) => BGC Heteroduplex G DNA DNA mismatch (T C) -> (G -> A) repair T C A G Non-crossover C rossover

  19. BGC: a neutral process that looks like selection • The dynamics of the fixation process for one locus under BGC is identical to that under directional selection (Nagylaki 1983) • BGC intensity depends on: • Recombination rate • Bias in the repair of DNA mismatches • Effective population size • GC-alleles have a higher probability of fixation than AT-alleles (Eyre-Walker 1999, Duret et al. 2002, Lercher et al. 2002, Spencer et al. 2006) • This fixation bias in favor of GC-alleles increases with recombination rate (Spencer 2006)

  20. Does BGC affect substitution patterns ? • BGC should affect the relative rates of AT->GC vs GC->AT substitutions in regions of high recombination • Relationship between neutral substitution patterns and recombinaion rate ?

  21. Substitution patterns in the hominidae lineage • Human, chimp, macaca whole genome alignments: • Genomicro: database of whole genome alignments • 2700 Mb (introns and intergenic regions) • Substitutions infered by maximum likelihood approach (collaboration with Peter Arndt, Berlin) • Substitution rates: • 4 transversion rates: A->T; C->G; A->C; C->A • 2 transition rates: A->G; G->A • transitions at CpG sites: G->A • Cross-over rate: HAPMAP

  22. GC-content expected at equilibrium (GC*) • Equilibrium GC-content : the GC content that sequences would reach if the pattern of substitution remains constant over time = the future of GC-content • Ratio of ATGC over GCAT substitution rates (taking into account CpG hypermutability)

  23. GC-content expected at equilibrium and recombination R2 = 36% p < 0.0001 60% Equilibrium GC-content GC* 50% 40% 30% 0 1 2 3 4 5 6 7 8 9 Cross-Over Rate (cM/Mb) N = 2707 non-overlapping windows (1 Mb), from autosomes

  24. GC-content and Recombination • Strong correlation: suggests direct causal relationship • GC-rich sequences promote recombination ? • Gerton et al. (2000), Petes & Merker (2002), Spencer et al. (2006) • Recombination promotes ATGC substitutions ?

  25. GC-content and recombination 70% N = 2707 R2 = 14% p < 0.001 60% Present GC- content 50% 40% 0 1 2 3 4 5 6 7 8 9 Cross-Over Rate (cM/Mb)

  26. GC-content expected at equilibrium and recombination R2 = 36% p < 0.0001 60% Equilibrium GC-content GC* 50% 40% 30% 0 1 2 3 4 5 6 7 8 9 Cross-Over Rate (cM/Mb) N = 2707 non-overlapping windows (1 Mb), from autosomes

  27. Recombination and GC-content Molecular events of meiotic recombination • Recombination events: crossover + non-crossover • Genetic maps: crossover Non-crossover C rossover => The correlation between GC* and crossover rate might underestimate the real correlation between GC* and recombination

  28. Evolution of GC-content: distance to telomeres Equilibrium GC-content GC* 0.60 N = 2707 R2 = 41% p < 0.0001 0.50 0.40 0.30 0.1 1 10 100 Distance to Telomere (Mb) GC* vs. crossover rate + distance telomeres: R2 = 53%

  29. BGC: a realistic model ? • Recombination occurs predominantly in hotspots that cover only 3% of the genome (Myers et al 2005) • Recombination hotspots evolve rapidly (their location is not conserved between human and chimp) (Ptak et al. 2005, Winkler et al. 2005) • Can BGC affect the evolution of Mb-long isochores ?

  30. BGC: a realistic model ? • Probability of fixation of a AT-allele • Probability of fixation of a GC-allele • Effective population size N ~ 10,000 • s : BGC coefficient • Recombination hotspots: s = 1.3 10-4 (Spencer et al. 2006) • No BGC outside hotspots: s = 0 • Hotspots density: 3% (in average), variations along chromosomes (0.05% to 10.7% ) • Pattern of mutation: constant across chromosomes

  31. BGC: a realistic model ? Observations Predictions of the BGC model Equilibrium GC-content GC* Crossover rate (cM/Mb)

  32. Summary (1) • Recombination : • Strong impact on patterns of substitutions • drives the evolution of GC-content • Most probably an consequence of BGC • Mutation: ! fixation bias favoring GC alleles ! • Selection: ! correlation with recombination rate ! • BGC: all observations fit the predictions of the model

  33. BGC can affect functional regions • Fxy gene : translocated in the pseudoautosomal region (PAR) of the X chromosome in Mus musculus X specific PAR Recombination rate normal extreme GC synonymous sites normal very high (55%) (90%)

  34. 5’ part of Fxy : 4 3’ part of Fxy : 5 2 1 1 0 0 3 1 1 0 X X Y Y PAR PAR Amino-acid substitutions in Fxy 80 60 Time (Myrs) 40 20 28 0 Homo Rattus M. spretus M. musculus

  35. 5’ part of Fxy : 4 3’ part of Fxy : 5 2 1 1 0 0 3 1 1 0 Amino-acid substitutions in Fxy 80 60 Time (Myrs) 40 20 28 0 Homo Rattus M. spretus M. musculus 28 non-synonymous substitutions, all ATGC NB: strong negative selection (Ka/Ks < 0.1)

  36. Amino-acid substitutions in Fxy BGC can drive the fixation of deleterious mutations

  37. BGC: a neutral process that looks like selection • BGC can confound selection tests

  38. HARs: human-accelerated regions • Pollard et al. (Nature, Plos Genet. 2006) : searching for positive selection in non-coding regulatory elements • Identify regulatory elements that have significantly accelerated in the human lineage = HARs

  39. Positive selection in the human lineage ? • 49 significant HARs • HAR1: 120 bp • Rate of evolution >> neutral rate (18 fixed substitutions in the human lineage, vs. 0.7 expected) • Part of a non-coding RNA gene • Expressed in the brain • Involved in the evolution of human-specific brain features ?

  40. Positive selection ? • GC-biased substitution pattern in HARs • HAR1: the 18 substitutions are all ATGC changes • Known functional elements (coding or non-coding) are not GC-rich • HAR1-5: no evidence of selective sweep (Pollard et al. 2006) • HAR1: the accelerated region covers >1 kb, i.e. is not restricted to the functional element

  41. Positive selection or BGC ? • HARs are located in regions of high recombination • Recombination occurs in hotspots (<2 kb) • Given known parameters (population size, fixation bias), the BGC model predicts substitution hotspots within recombination hotspots • HARs = substitution hotspots caused by BGC in recombination hotspots

  42. Conclusion (1) Recombination drives the evolution of GC-content in mammals GC-rich isochores = result of BGC in highly recombining parts of the genome Probably a universal process: correlation GC / recombination in many taxa (yeast, drosophila, nematode, paramecia, …)

  43. Conclusion (2) BGC => substitution hotspots in recombination hotspots Recombination hotspots = the Achilles’ heel of our genome

  44. Conclusion (3) Probability of fixation depends on: - selection - drift (population size) - BGC Extending the null hypothesis of neutral evolution: mutation + BGC Galtier & Duret (2007) Trends Genet

  45. Thanks • Vincent Lombard (Génomicro) • Nicolas Galtier (Montpellier) • Peter Arndt (Berlin) • Katherine Pollard (UC Davis)

  46. Sex-specific effects • Correlation GC* / crossover rate (deCODE genetic map): • male: R2 = 31% • female: R2 = 15% • The rate of cross-over is a poor predictor of the total recombination rate in female: more variability in the ratio non-crossover / crossover along chromosomes ?

  47. Chromosome size, recombination and GC-content Human Human GC* Crossover rate (cM/Mb) R2=0.84 R2=0.66 Chromosome length (Mb) Crossover rate (cM/Mb) Chicken Chicken Crossover rate (cM/Mb) Current GC R2=0.82 R2=0.81 Chromosome length (Mb) Crossover rate (cM/Mb)

  48. Recombination and GC-content: a universal relationship ?

  49. G+C content vs. chromosome length: yeast R2= 61% Bradnam et al. (1999) Mol Biol Evol

  50. G+C content vs. chromosome length: Paramecium R2= 67% GC-content Chromosome size (kb)

More Related