560 likes | 786 Views
Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ?. Laurent Duret, Nicolas Galtier, Peter Arndt. ACI-IMPBIO 4-5 octobre 2007. What’s in our genome ?. 3.1 10 9 bp Repeated sequences: ~50% 20,000-25,000 protein-coding genes
E N D
Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5 octobre 2007
What’s in our genome ? • 3.1 109 bp • Repeated sequences: ~50% • 20,000-25,000 protein-coding genes • Protein-coding regions : 1.2% • Other functional elements in non-coding regions: 4-10%
What make chimps different from us ? • What are the functional elements responsible for adaptative evolution ? 30 106 point substitutions + indels + duplications (copy number variations)
Genome annotation by comparative genomics • Basic principle : • Functional element <=> constrained by natural selection • Detecting the hallmarks of selection in genomic sequences • Negative selection (conservation) • Positive selection (adaptation)
DNA repair Individual no transmission to the offspring soma Mutation germline Substitution transmission to the offspring (polymorphism) Loss of the allele Population (N) Fixation Evolution : mutation, selection, drift Base modification, replication error, deletion, insertion, ... = premutation
Evolution : mutation, selection, drift Probability of fixation: p = f(s, Ne) s : relative impact on fitness s = 0 : neutral mutation (random genetic drift) s < 0 : disadvantageous mutation = negative (purifying) selection s > 0 : advantageous mutation = positive(directional) selection Ne : effective population size: stochastic effects of gamete sampling are stronger in small populations |Nes| < 1 : effectively neutral mutation
Individual Mutation Substitution Polymorphism Population (Ne) Fixation Demonstrate the action of selection = reject the predictions of the neutral model Base modification, replication error, deletion, insertion, etc. Substitution rate = f(mutation rate, fixation probability) |Nes| < 1 : substitution rate = mutation rate
Tracking natural selection ... • Mutation rate: u • Substitution rate: K • Negative selection => K < u • Neutral evolution => K = u • Positive selection => K > u How to estimate u ? => Use of neutral markers
Tracking natural selection ... • Synonymous substitution rate: Ks • Non-synonymous substitution rate: Ka • Hypothesis: synonymous sites evolve (nearly) neutraly • Ks ~ u • Negative selection => Ka < Ks • Neutral evolution => Ka = Ks • Positive selection => Ka > Ks
Tracking natural selection ... is not so easy • Patterns of neutral substitution vary along chromosomes • Impact of molecular processes (replication, DNA-repair, transcription, recombination, …) • Genomic environment (susceptibility to mutagens)
chromosome 19 100 kb 60 50 GC% 40 30 0 200 400 600 800 1000 kb Mammalian genomic landscapes • Large scale variations of base composition along chromosomes (isochores) chromosome 21 Sliding windows : 20 kb, step = 2 kb
GC content variations affect both coding and non-coding regions 3661 human genes from 1652 large genomic sequences (> 50 kb; average = 134 kb). Total = 221 Mb (98% non-coding)
What is the evolutionary process responsible for these large-scale variations in base composition ?
Variation in mutation patterns ? • Analysis of polymorphism data: in GC-rich regions, AT->GC mutations have a higher probability of fixation than GC->AT mutations (Eyre-Walker 1999; Duret et al. 2002; Spencer et al. 2006)
Selection ? • What could be the selective advantage confered by a single AT->GC mutations in a Mb-long genomic region ???
Biased Gene Conversion (BGC) Molecular events of meiotic recombination T If DNA mismatch repair is biased (i.e. probability of repair is not 50% in favor of each base) => BGC Heteroduplex G DNA DNA mismatch (T C) -> (G -> A) repair T C A G Non-crossover C rossover
BGC: a neutral process that looks like selection • The dynamics of the fixation process for one locus under BGC is identical to that under directional selection (Nagylaki 1983) • BGC intensity depends on: • Recombination rate • Bias in the repair of DNA mismatches • Effective population size • GC-alleles have a higher probability of fixation than AT-alleles (Eyre-Walker 1999, Duret et al. 2002, Lercher et al. 2002, Spencer et al. 2006) • This fixation bias in favor of GC-alleles increases with recombination rate (Spencer 2006)
Does BGC affect substitution patterns ? • BGC should affect the relative rates of AT->GC vs GC->AT substitutions in regions of high recombination • Relationship between neutral substitution patterns and recombinaion rate ?
Substitution patterns in the hominidae lineage • Human, chimp, macaca whole genome alignments: • Genomicro: database of whole genome alignments • 2700 Mb (introns and intergenic regions) • Substitutions infered by maximum likelihood approach (collaboration with Peter Arndt, Berlin) • Substitution rates: • 4 transversion rates: A->T; C->G; A->C; C->A • 2 transition rates: A->G; G->A • transitions at CpG sites: G->A • Cross-over rate: HAPMAP
GC-content expected at equilibrium (GC*) • Equilibrium GC-content : the GC content that sequences would reach if the pattern of substitution remains constant over time = the future of GC-content • Ratio of ATGC over GCAT substitution rates (taking into account CpG hypermutability)
GC-content expected at equilibrium and recombination R2 = 36% p < 0.0001 60% Equilibrium GC-content GC* 50% 40% 30% 0 1 2 3 4 5 6 7 8 9 Cross-Over Rate (cM/Mb) N = 2707 non-overlapping windows (1 Mb), from autosomes
GC-content and Recombination • Strong correlation: suggests direct causal relationship • GC-rich sequences promote recombination ? • Gerton et al. (2000), Petes & Merker (2002), Spencer et al. (2006) • Recombination promotes ATGC substitutions ?
GC-content and recombination 70% N = 2707 R2 = 14% p < 0.001 60% Present GC- content 50% 40% 0 1 2 3 4 5 6 7 8 9 Cross-Over Rate (cM/Mb)
GC-content expected at equilibrium and recombination R2 = 36% p < 0.0001 60% Equilibrium GC-content GC* 50% 40% 30% 0 1 2 3 4 5 6 7 8 9 Cross-Over Rate (cM/Mb) N = 2707 non-overlapping windows (1 Mb), from autosomes
Recombination and GC-content Molecular events of meiotic recombination • Recombination events: crossover + non-crossover • Genetic maps: crossover Non-crossover C rossover => The correlation between GC* and crossover rate might underestimate the real correlation between GC* and recombination
Evolution of GC-content: distance to telomeres Equilibrium GC-content GC* 0.60 N = 2707 R2 = 41% p < 0.0001 0.50 0.40 0.30 0.1 1 10 100 Distance to Telomere (Mb) GC* vs. crossover rate + distance telomeres: R2 = 53%
BGC: a realistic model ? • Recombination occurs predominantly in hotspots that cover only 3% of the genome (Myers et al 2005) • Recombination hotspots evolve rapidly (their location is not conserved between human and chimp) (Ptak et al. 2005, Winkler et al. 2005) • Can BGC affect the evolution of Mb-long isochores ?
BGC: a realistic model ? • Probability of fixation of a AT-allele • Probability of fixation of a GC-allele • Effective population size N ~ 10,000 • s : BGC coefficient • Recombination hotspots: s = 1.3 10-4 (Spencer et al. 2006) • No BGC outside hotspots: s = 0 • Hotspots density: 3% (in average), variations along chromosomes (0.05% to 10.7% ) • Pattern of mutation: constant across chromosomes
BGC: a realistic model ? Observations Predictions of the BGC model Equilibrium GC-content GC* Crossover rate (cM/Mb)
Summary (1) • Recombination : • Strong impact on patterns of substitutions • drives the evolution of GC-content • Most probably an consequence of BGC • Mutation: ! fixation bias favoring GC alleles ! • Selection: ! correlation with recombination rate ! • BGC: all observations fit the predictions of the model
BGC can affect functional regions • Fxy gene : translocated in the pseudoautosomal region (PAR) of the X chromosome in Mus musculus X specific PAR Recombination rate normal extreme GC synonymous sites normal very high (55%) (90%)
5’ part of Fxy : 4 3’ part of Fxy : 5 2 1 1 0 0 3 1 1 0 X X Y Y PAR PAR Amino-acid substitutions in Fxy 80 60 Time (Myrs) 40 20 28 0 Homo Rattus M. spretus M. musculus
5’ part of Fxy : 4 3’ part of Fxy : 5 2 1 1 0 0 3 1 1 0 Amino-acid substitutions in Fxy 80 60 Time (Myrs) 40 20 28 0 Homo Rattus M. spretus M. musculus 28 non-synonymous substitutions, all ATGC NB: strong negative selection (Ka/Ks < 0.1)
Amino-acid substitutions in Fxy BGC can drive the fixation of deleterious mutations
BGC: a neutral process that looks like selection • BGC can confound selection tests
HARs: human-accelerated regions • Pollard et al. (Nature, Plos Genet. 2006) : searching for positive selection in non-coding regulatory elements • Identify regulatory elements that have significantly accelerated in the human lineage = HARs
Positive selection in the human lineage ? • 49 significant HARs • HAR1: 120 bp • Rate of evolution >> neutral rate (18 fixed substitutions in the human lineage, vs. 0.7 expected) • Part of a non-coding RNA gene • Expressed in the brain • Involved in the evolution of human-specific brain features ?
Positive selection ? • GC-biased substitution pattern in HARs • HAR1: the 18 substitutions are all ATGC changes • Known functional elements (coding or non-coding) are not GC-rich • HAR1-5: no evidence of selective sweep (Pollard et al. 2006) • HAR1: the accelerated region covers >1 kb, i.e. is not restricted to the functional element
Positive selection or BGC ? • HARs are located in regions of high recombination • Recombination occurs in hotspots (<2 kb) • Given known parameters (population size, fixation bias), the BGC model predicts substitution hotspots within recombination hotspots • HARs = substitution hotspots caused by BGC in recombination hotspots
Conclusion (1) Recombination drives the evolution of GC-content in mammals GC-rich isochores = result of BGC in highly recombining parts of the genome Probably a universal process: correlation GC / recombination in many taxa (yeast, drosophila, nematode, paramecia, …)
Conclusion (2) BGC => substitution hotspots in recombination hotspots Recombination hotspots = the Achilles’ heel of our genome
Conclusion (3) Probability of fixation depends on: - selection - drift (population size) - BGC Extending the null hypothesis of neutral evolution: mutation + BGC Galtier & Duret (2007) Trends Genet
Thanks • Vincent Lombard (Génomicro) • Nicolas Galtier (Montpellier) • Peter Arndt (Berlin) • Katherine Pollard (UC Davis)
Sex-specific effects • Correlation GC* / crossover rate (deCODE genetic map): • male: R2 = 31% • female: R2 = 15% • The rate of cross-over is a poor predictor of the total recombination rate in female: more variability in the ratio non-crossover / crossover along chromosomes ?
Chromosome size, recombination and GC-content Human Human GC* Crossover rate (cM/Mb) R2=0.84 R2=0.66 Chromosome length (Mb) Crossover rate (cM/Mb) Chicken Chicken Crossover rate (cM/Mb) Current GC R2=0.82 R2=0.81 Chromosome length (Mb) Crossover rate (cM/Mb)
G+C content vs. chromosome length: yeast R2= 61% Bradnam et al. (1999) Mol Biol Evol
G+C content vs. chromosome length: Paramecium R2= 67% GC-content Chromosome size (kb)