1 / 49

Using genetics to study human history and natural selection

Using genetics to study human history and natural selection. David Reich Harvard Medical School Depatment of Genetics Broad Institute.

kert
Download Presentation

Using genetics to study human history and natural selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using genetics to study human history and natural selection David ReichHarvard Medical School Depatment of GeneticsBroad Institute

  2. tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttgaggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctcatttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttgaggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca t c g a g a t c t c g a g c t c g a t c t c t c g a g a t c g a t c g a g c g c g a g a t c g a g c g a g a

  3. Section 1: How human history affects human genetic variation A 2-part talk: Section 2: Detecting selection by the pattern of genetic variation and finding disease genes

  4. Section 1 How does human history affect genetic variation? A genome-wide survey of Linkage Disequilibrium Linkage disequilibrium is a phenomenon whereby genetic variants are associated: people who have one tend to have a second as well

  5. Section 1 Emergence of Variations Over Time Disease Mutation Common Ancestor present time Linkage Disequilibrium Explained Variations in Chromosomes Within a Population

  6. Section 1 Disease-Causing Mutation 2,000 gens. ago 1,000 gens. ago What Determines Extent of LD? Time = present

  7. Section 1 Range of uncertainty 160kb 0kb 5kb 10kb 20kb 40kb 80kb How Far Does Association (LD) Extend Between Neighboring Common Sites? • Theoretical: 3-8 kb

  8. Section 1 5 5 10 20 40 80 160kb 0kb 5kb 10kb 20kb 40kb 80kb Strategy for Assessing Extent of LD Distance from core single nucleotide polymorphism (SNP) • 19 regions• 44 Caucasian samples from Utah• a great deal of DNA sequencing per sample

  9. Section 1

  10. Section 1 A Genome-Wide Assessment of Linkage Disequilibrium Disease Gene Mapping Human history

  11. Section 1 MYSTERY: What explains the long-range LD? Important event in population history?

  12. Section 1 Positive Control: 48 Swedes Identical pattern to Utah

  13. 96 Nigerians (Yoruba) Section 1 Much Less LD Associations in Africans a SUBSET of those in Caucasians MUST be influenced by population history

  14. Section 1 Confirmation of less LD in Africans from Direct DNA Sequencing Anna DiRienzo also shows this pattern

  15. More evidence from Genotyping ~5,000 SNPs (Gabriel et al. 2002) Section 1 K. Kidd, J. Kidd, Sarah Tishkoff also show this

  16. Section 1 Explanation: Bottleneck or ‘Founder Effect’ in History of North Europeans Ancestral Population • likely <10 founding chromosomes ~100,000 years ago What was this event? (1) Out of Africa? (2) Founding of Europe? North Europeans Yoruba Ancestors

  17. Section 1 Open Mysteries • what caused the bottleneck event? “Out of Africa” migration? • how many people involved? When did it occur? • can we better understand when the founder event occurred, and how many people involved?

  18. Acknowledgements for Section 1 Samples:Leif GroopRichard CooperCharles Rotimi Collaborators:Michele CargillStacey BolkJames IrelandPardis C. SabetiDaniel J. RichterThomas LaveryRose KouyoumjianShelli F. FarhadianRyk WardEric S. Lander

  19. Section 2 Using Long-Range Linkage Disequilibrium to Detect Positive Selection in the Genome

  20. Section 2 Overview • The difficulty of detecting genomic regions • affected by natural selection 2. The long-range haplotype test 3. Results for two genes: G6PD and CD40 ligand

  21. Section 2 Existing formal tests for selection DNA Sequence analysis Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio Weak Genotyping-based tests Not general at present

  22. Section 2 Young alleles: • low frequency • long-range LD Old alleles: • low or high frequency • short-range LD Young alleles: • high frequency • long-range LD Our test is based on the relationship between allele frequency and extent of linkage disequilibrium No selection Positive Selection

  23. Section 2 Positive Selection Neutrality The signal of selection Linkage Disequilibrium (Homozygosity) frequency

  24. Section 2 gene 1 2 Core Haplotypes 3 4 5 Paradigm of the Core Region

  25. Section 2 Core markers Long-range markers C/T A/G A/G C/T C/T C/T gene 1 2 3 4 5 Decay of LD Long-range multi-SNP haplotypes

  26. Section 2 Core markers Long-range markers C/T A/G C/T C/T C/T A/G gene C T T T A G C C T C C T G G C T T C 100% 75% 35% 18% Long-range multi-SNP haplotypes 3 Decay of homozygosity(probability, at any distance, that any two haplotypes that start out the same have all the same SNP genotypes)

  27. Section 2 G6PD (1960’s) • well established association to malaria resistance • selection demonstrated in 2001 by Tishkoff et al. CD40 ligand(2002): • Recent association by Sabeti et al. • involved in immune regulation Two genes associated with malaria resistance

  28. Section 2 G6PD -480kb +220kb G6PD (11 SNPs in core, 14 at long distances) telomere Gene -480kb +220kb TNFSF5 -180kb +520kb telomere CD40 ligand (7 SNPs in core, 14 at long distances) -180kb Gene +520kb Experimental Design

  29. Section 2 Experimental Design DNA samples from 231 African men Yoruba (Nigeria) Beni (Nigeria) Shona (Zimbabwe) Perfect phase (X chromosome)

  30. Section 2 G6PD CD40 ligand Africans (230) non-Africans (95) Africans (231) non-Africans (91) 38 72 4 28 28 14 41 5 4 61 13 17 5 91 9 78 30 1 77 21 7 7 1 1 2 2 3 3 4 4 5 5 6 6 7 8 “A-” protective haplotype 9 Core haplotypes

  31. Section 2 G6PD-corehap1 G6PD-corehap6 G6PD-corehap3 G6PD-corehap7 G6PD-corehap4 G6PD-corehap8 G6PD-corehap8 “A-” protective haplotype G6PD-corehap5 G6PD-corehap G6PD: long-range haplotype diversity

  32. Section 2 G6PD: homozygosity vs. distance EHH Distance from the core region ( kb)

  33. Section 2 Core haplotype 8 P << 0.0008 Relative EHH Core haplotype frequency G6PD: computer simulation vs. data

  34. Section 2 G6PD: P-values from simulation value P- Distance from the core region ( kb)

  35. Section 2 Relative EHH Core haplotype frequency G6PD also stands out in comparison to 7 control regions

  36. Section 2 corehap1 corehap4 corehap4 corehap2 corehap5 corehap3 CD40 ligand: long-range haplotype diversity

  37. Section 2 CD40 ligand: homozygosity vs. distance EHH Distance from the core region ( kb)

  38. Section 2 Core haplotype 4 P << 0.0011 Relative EHH Core haplotype frequency CD40 ligand: computer simulation vs. data

  39. Section 2 CD40 ligand: P-values from simulation value P- Distance from the core region ( kb)

  40. Section 2 Relative EHH Core haplotype frequency CD40 ligand also stands out in comparison to 7 control regions

  41. Section 2 Malaria resistance arose in last 10,000 years in Africa Long-range linkage disequilibrium also gives a direct estimate of the date ~2,500 years ago for G6PD ~6,500 years ago for CD40 ligand

  42. Section 2 Traditional tests fail to detect the effect Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio Not significant in our data. This test is a powerful way to detect selection in last 10,000 years

  43. Section 2 1 2 3 4 Conclusions: Powerful general approach for detecting selection

  44. Section 2 Conclusions: Powerful general approach for detecting selection 1 2 3 4 5

  45. Section 2 Conclusions: Powerful general approach for detecting selection 1 2 3 4 Screen the genome for Postive Selection

  46. Section 2 Conclusions: Genome-wide screen for natural selection • We can find disease genes without patients!

  47. Section 2 What’s coming… • Generalization of the long-range haplotype test • Application of the approach genome-wide • Haplotype map data set • Disease gene screen data sets

  48. Acknowledgements for Section 2 Pardis C. SabetiJohn HigginsHaninah Z.P. LevineDaniel J. RichterStephen F. SchaffnerStacey GabrielJill V. PlatkoNicholas J. Patterson Gavin J. McDonaldHans C. AckermanSarah J. CampbellDavid AltshulerRichard CooperRyk WardEric S. Lander

  49. Note The 3rd section of the talk is not included here because it presents data that have not yet been published.

More Related