1 / 59

Cryptic Variation in the Human mutation rate

Cryptic Variation in the Human mutation rate. Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis. Variation in the mutation rate:. Between different chromosomes Between regions on chromosomes Neighbouring nucleotides. Simple context effects:.

cherie
Download Presentation

Cryptic Variation in the Human mutation rate

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis

  2. Variation in the mutation rate: • Between different chromosomes • Between regions on chromosomes • Neighbouring nucleotides

  3. Simple context effects: Hwang and Green (2004) PNAS101: 13994-14001

  4. Cryptic Variation: • Remote context: • AGTCGGTTACCGTGACGTTGAACGTGT

  5. Cryptic Variation: • Remote context: • AGTCGGTTACCGTGACGTTGAACGTGT • Degenerate context: • AGTCGGTTACCGTGYSRGYGAACGTGT

  6. Cryptic Variation: • Remote context: • AGTCGGTTACCGTGACGTTGAACGTGT • Degenerate context: • AGTCGGTTACCGTGYSRGYGAACGTGT • No context / Complex context

  7. Human Chimp Our approach to the problem • Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp.

  8. Human Chimp Our approach to the problem • Search for SNPs in human sequences that also have a SNP in the orthologous position in chimp. Do we see more coincident SNPs than expected by chance?

  9. The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis.

  10. The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP.

  11. The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP. • BLAST chimp SNPs against human database.

  12. The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP. • BLAST chimp SNPs against human database. • Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position.

  13. The method • Extract all human SNPs from dbSNP and construct a BLAST database on a chromosome by chromosome basis. • Extract all chimp SNPs from dbSNP with 50bp either side of SNP. • BLAST chimp SNPs against human database. • Extract results above a certain level of homology where there is a SNP on both sequences and reduce to 40bp either side of central position. • Repeating both including and excluding CpG effects.

  14. Results • ~1.5 million chimp SNPs. • ~310,000 81bp alignments containing a human and chimp SNP.

  15. Results • ~1.5 million chimp SNPs. • ~310,000 81bp alignments containing a human and chimp SNP. • Observe the number of coincident SNPs. • Calculate the expected number, taking into account the effects of neighbouring nucleotides.

  16. Results

  17. Results

  18. Alternative Explanations • Bias in the Method • Selection • Ancestral Polymorphism • Paralogous SNPs

  19. Alternative Explanations • Bias in the Method • Selection • Ancestral Polymorphism • Paralogous SNPs

  20. Methodological Bias • Simulated data with same density of human and chimp SNPs as dbSNP under different divergence and mutation patterns. • Method worked well under realistic conditions.

  21. Methodological Bias All sites (H&G): Non CpG sites (H&G):

  22. Methodological Bias All sites (H&G): Non CpG sites (H&G):

  23. Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs

  24. Selection • Areas of low SNP density result in clustering: Human Chimp

  25. Selection • Areas of low SNP density result in clustering: Human Chimp Apparent excess of coincident SNPs

  26. Selection • No clustering:

  27. Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs

  28. T T Common Ancestor T T T T A A T Human T Chimp A A Ancestral Polymorphism • SNP inherited from common ancestor of chimp and human:

  29. T T Common Ancestor T T T T A A T Human T Chimp A A Ancestral Polymorphism • SNP inherited from common ancestor of chimp and human: Increase in coincident SNPs

  30. Ancestral Polymorphism • Expect observed/expected ratio to be same for all transitions:

  31. Ancestral Polymorphism • Repeated initial analysis with macaque data. • Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms.

  32. Ancestral Polymorphism • Repeated initial analysis with macaque data. • Humans and Macaque split ~23-24 million years ago so we expect there to be no shared polymorphisms.

  33. Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs

  34. Paralogous SNPs • Excess of coincidentSNPs a consequence of artifactualSNPscalled as a result of substitutions in paralogousregions.

  35. Paralogous SNPs • Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions. • Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy.

  36. Paralogous SNPs • Excess of coincident SNPs a consequence of artifactual SNPs called as a result of substitutions in paralogous regions. • Musumeci et al (2010): 8.32% of human variation in dbSNP may be due to paralogy. AGCTGCACGT Y CGGCATCCAA SNP AGCTGCACGT T CGGCATCCAA Chromosome 1 AGCTGCACGT A CGGCATCCAA Chromosome 7 Artifactual SNP

  37. Paralogous SNPs AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT A CGGCATCCAA

  38. Paralogous SNPs AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T CGGCATCCAA AGCTGCACGT A CGGCATCCAA 3.6% of coincident SNPs are possibly a consequence of paralogous sequences

  39. Alternative Explanations • Bias in the method • Selection • Ancestral Polymorphism • Paralogous SNPs Cryptic variation in the mutation rate

  40. Context Analysis • 4517 sequences containing non-CpG coincident SNPs flanked by 200bp. • Tabulate triplet frequencies at each position in surrounding sequences. • Test whether the proportions of triplets we observe at each position significantly different from the proportions in the sequences as a whole.

  41. Context Analysis • Coincident SNP in central position:

  42. Context Analysis • Coincident SNP in central position: No obvious context surrounding coincident SNPs

  43. Genomic Distribution • Tallied the number of coincident SNPs per MB: • 3.91 coincident SNPs per MB. • 1.68 non-CpG coincident SNPs per MB.

  44. Genomic Distribution • Tallied the number of coincident SNPs per MB: • 3.91 coincident SNPs per MB. • 1.68 non-CpG coincident SNPs per MB. • If randomly distributed expect Poisson distribution and  = 2 = 3.91

  45. Genomic Distribution • Tallied the number of coincident SNPs per MB: • 3.91 coincident SNPs per MB. • 1.68 non-CpG coincident SNPs per MB. • If randomly distributed expect Poisson distribution and  = 2 = 3.91 • 2 = 13.27 (p<0.001) and so sampling variance explains approximately 30% of total variance.

  46. Genomic Distribution

  47. Genomic Distribution • SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone.

  48. Genomic Distribution • SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone. • Recombination rate positively correlated with SNP density (r = 0.242, p<0.001). • Partial correlation controlling for SNP density: r = 0.048, p=0.011**.

  49. Genomic Distribution • SNP densities must drive coincident SNP densities to a certain extent as approximately half of coincident SNPs are created by chance alone. • Recombination rate positively correlated with SNP density (r = 0.242, p<0.001). • Partial correlation controlling for SNP density: r = 0.048, p=0.011**. • SNP densities explain 6.5% of the variance, recombination rate explains 0.2% of the variance of coincident SNPs.

  50. Genomic Distribution

More Related