600 likes | 818 Views
Cryptic Variation in the Human mutation rate. Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis. Variation in the mutation rate:. Between different chromosomes Between regions on chromosomes Neighbouring nucleotides. Simple context effects:.
E N D
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis
Variation in the mutation rate: • Between different chromosomes • Between regions on chromosomes • Neighbouring nucleotides
Simple context effects: Hwang and Green (2004) PNAS101: 13994-14001
Cryptic Variation: • Remote context: • AGTCGGTTACCGTGACGTTGAACGTGT
Cryptic Variation: • Remote context: • AGTCGGTTACCGTGACGTTGAACGTGT • Degenerate context: • AGTCGGTTACCGTGYSRGYGAACGTGT
Cryptic Variation: • Remote context: • AGTCGGTTACCGTGACGTTGAACGTGT • Degenerate context: • AGTCGGTTACCGTGYSRGYGAACGTGT • No context / Complex context
Human Chimp Our approach to the problem • Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp.
Human Chimp Our approach to the problem • Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp. Do we see more coincident SNPs than expected by chance?
The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.
The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP.
The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP. • BLAST chimp SNPs against human database.
The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP. • BLAST chimp SNPs against human database. • Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position.
The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP. • BLAST chimp SNPs against human database. • Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position. • Repeating both including and excluding CpG effects.
Results • ~1.5 million chimp SNPs. • ~310,000 81bp alignments containing a human and chimp SNP.
Results • ~1.5 million chimp SNPs. • ~310,000 81bp alignments containing a human and chimp SNP. • Observe the number of coincident SNPs. • Calculate the expected number, taking into account the effects of neighbouring nucleotides.
Alternative Explanations • Bias in the Method • Selection • Ancestral Polymorphism • Paralogous SNPs
Alternative Explanations • Bias in the Method • Selection • Ancestral Polymorphism • Paralogous SNPs
Methodological Bias • Simulated data with same density of human and chimp SNPs as dbSNP under different divergence and mutation patterns. • Method worked well under realistic conditions.
Methodological Bias All sites (H&G): Non CpG sites (H&G):
Methodological Bias All sites (H&G): Non CpG sites (H&G):
Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs
Selection • Areas of low SNP density result in clustering: Human Chimp
Selection • Areas of low SNP density result in clustering: Human Chimp Apparent excess of coincident SNPs
Selection • No clustering:
Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs
T T Common Ancestor T T T T A A T Human T Chimp A A Ancestral Polymorphism • SNP inherited from common ancestor of chimp and human:
T T Common Ancestor T T T T A A T Human T Chimp A A Ancestral Polymorphism • SNP inherited from common ancestor of chimp and human: Increase in coincident SNPs
Ancestral Polymorphism • Expect observed/expected ratio to be same for all transitions:
Ancestral Polymorphism • Repeated initial analysis with macaque data. • Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms.
Ancestral Polymorphism • Repeated initial analysis with macaque data. • Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms.
Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs
Paralogous SNPs • Excess of coincidentSNPs a consequence of artifactualSNPscalled as a result of substitutions in paralogousregions.
Paralogous SNPs • Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions. • Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy.
Paralogous SNPs • Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions. • Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy. AGCTGCACGT Y CGGCATCCAA SNP AGCTGCACGT T CGGCATCCAA Chromosome 1 AGCTGCACGT A CGGCATCCAA Chromosome 7 Artifactual SNP
Paralogous SNPs AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT A CGGCATCCAA
Paralogous SNPs AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT A CGGCATCCAA 3.6% of coincident SNPs are possibly a consequence of paralogous sequences
Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs Cryptic variation in the mutation rate
Context Analysis • 4517 sequences containing non-CpG coincident SNPs flanked by 200bp. • Tabulate triplet frequencies at each position in surrounding sequences. • Test whether the proportions of triplets we observe at each position significantly different from the proportions in the sequences as a whole.
Context Analysis • Coincident SNP in central position:
Context Analysis • Coincident SNP in central position: No obvious context surrounding coincident SNPs
Genomic Distribution • Tallied the number of coincident SNPs per MB: • 3.91 coincident SNPs per MB. • 1.68 non-CpG coincident SNPs per MB.
Genomic Distribution • Tallied the number of coincident SNPs per MB: • 3.91 coincident SNPs per MB. • 1.68 non-CpG coincident SNPs per MB. • If randomly distributed expect Poisson distribution and = 2 = 3.91
Genomic Distribution • Tallied the number of coincident SNPs per MB: • 3.91 coincident SNPs per MB. • 1.68 non-CpG coincident SNPs per MB. • If randomly distributed expect Poisson distribution and = 2 = 3.91 • 2 = 13.27 (p<0.001) and so sampling variance explains approximately 30% of total variance.
Genomic Distribution • SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone.
Genomic Distribution • SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone. • Recombination rate positively correlated with SNP density (r = 0.242, p<0.001). • Partial correlation controlling for SNP density: r = 0.048, p=0.011**.
Genomic Distribution • SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone. • Recombination rate positively correlated with SNP density (r = 0.242, p<0.001). • Partial correlation controlling for SNP density: r = 0.048, p=0.011**. • SNP densities explain 6.5% of the variance, recombination rate explains 0.2% of the variance of coincident SNPs.