530 likes | 699 Views
Using genetics to study human history and natural selection. David Reich Harvard Medical School Depatment of Genetics Broad Institute.
E N D
Using genetics to study human history and natural selection David ReichHarvard Medical School Depatment of GeneticsBroad Institute
tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttgaggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctcatttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttgaggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca t c g a g a t c t c g a g c t c g a t c t c t c g a g a t c g a t c g a g c g c g a g a t c g a g c g a g a
Section 1: How human history affects human genetic variation A 2-part talk: Section 2: Detecting selection by the pattern of genetic variation and finding disease genes
Section 1 How does human history affect genetic variation? A genome-wide survey of Linkage Disequilibrium Linkage disequilibrium is a phenomenon whereby genetic variants are associated: people who have one tend to have a second as well
Section 1 Emergence of Variations Over Time Disease Mutation Common Ancestor present time Linkage Disequilibrium Explained Variations in Chromosomes Within a Population
Section 1 Disease-Causing Mutation 2,000 gens. ago 1,000 gens. ago What Determines Extent of LD? Time = present
Section 1 Range of uncertainty 160kb 0kb 5kb 10kb 20kb 40kb 80kb How Far Does Association (LD) Extend Between Neighboring Common Sites? • Theoretical: 3-8 kb
Section 1 5 5 10 20 40 80 160kb 0kb 5kb 10kb 20kb 40kb 80kb Strategy for Assessing Extent of LD Distance from core single nucleotide polymorphism (SNP) • 19 regions• 44 Caucasian samples from Utah• a great deal of DNA sequencing per sample
Section 1 A Genome-Wide Assessment of Linkage Disequilibrium Disease Gene Mapping Human history
Section 1 MYSTERY: What explains the long-range LD? Important event in population history?
Section 1 Positive Control: 48 Swedes Identical pattern to Utah
96 Nigerians (Yoruba) Section 1 Much Less LD Associations in Africans a SUBSET of those in Caucasians MUST be influenced by population history
Section 1 Confirmation of less LD in Africans from Direct DNA Sequencing Anna DiRienzo also shows this pattern
More evidence from Genotyping ~5,000 SNPs (Gabriel et al. 2002) Section 1 K. Kidd, J. Kidd, Sarah Tishkoff also show this
Section 1 Explanation: Bottleneck or ‘Founder Effect’ in History of North Europeans Ancestral Population • likely <10 founding chromosomes ~100,000 years ago What was this event? (1) Out of Africa? (2) Founding of Europe? North Europeans Yoruba Ancestors
Section 1 Open Mysteries • what caused the bottleneck event? “Out of Africa” migration? • how many people involved? When did it occur? • can we better understand when the founder event occurred, and how many people involved?
Acknowledgements for Section 1 Samples:Leif GroopRichard CooperCharles Rotimi Collaborators:Michele CargillStacey BolkJames IrelandPardis C. SabetiDaniel J. RichterThomas LaveryRose KouyoumjianShelli F. FarhadianRyk WardEric S. Lander
Section 2 Using Long-Range Linkage Disequilibrium to Detect Positive Selection in the Genome
Section 2 Overview • The difficulty of detecting genomic regions • affected by natural selection 2. The long-range haplotype test 3. Results for two genes: G6PD and CD40 ligand
Section 2 Existing formal tests for selection DNA Sequence analysis Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio Weak Genotyping-based tests Not general at present
Section 2 Young alleles: • low frequency • long-range LD Old alleles: • low or high frequency • short-range LD Young alleles: • high frequency • long-range LD Our test is based on the relationship between allele frequency and extent of linkage disequilibrium No selection Positive Selection
Section 2 Positive Selection Neutrality The signal of selection Linkage Disequilibrium (Homozygosity) frequency
Section 2 gene 1 2 Core Haplotypes 3 4 5 Paradigm of the Core Region
Section 2 Core markers Long-range markers C/T A/G A/G C/T C/T C/T gene 1 2 3 4 5 Decay of LD Long-range multi-SNP haplotypes
Section 2 Core markers Long-range markers C/T A/G C/T C/T C/T A/G gene C T T T A G C C T C C T G G C T T C 100% 75% 35% 18% Long-range multi-SNP haplotypes 3 Decay of homozygosity(probability, at any distance, that any two haplotypes that start out the same have all the same SNP genotypes)
Section 2 G6PD (1960’s) • well established association to malaria resistance • selection demonstrated in 2001 by Tishkoff et al. CD40 ligand(2002): • Recent association by Sabeti et al. • involved in immune regulation Two genes associated with malaria resistance
Section 2 G6PD -480kb +220kb G6PD (11 SNPs in core, 14 at long distances) telomere Gene -480kb +220kb TNFSF5 -180kb +520kb telomere CD40 ligand (7 SNPs in core, 14 at long distances) -180kb Gene +520kb Experimental Design
Section 2 Experimental Design DNA samples from 231 African men Yoruba (Nigeria) Beni (Nigeria) Shona (Zimbabwe) Perfect phase (X chromosome)
Section 2 G6PD CD40 ligand Africans (230) non-Africans (95) Africans (231) non-Africans (91) 38 72 4 28 28 14 41 5 4 61 13 17 5 91 9 78 30 1 77 21 7 7 1 1 2 2 3 3 4 4 5 5 6 6 7 8 “A-” protective haplotype 9 Core haplotypes
Section 2 G6PD-corehap1 G6PD-corehap6 G6PD-corehap3 G6PD-corehap7 G6PD-corehap4 G6PD-corehap8 G6PD-corehap8 “A-” protective haplotype G6PD-corehap5 G6PD-corehap G6PD: long-range haplotype diversity
Section 2 G6PD: homozygosity vs. distance EHH Distance from the core region ( kb)
Section 2 Core haplotype 8 P << 0.0008 Relative EHH Core haplotype frequency G6PD: computer simulation vs. data
Section 2 G6PD: P-values from simulation value P- Distance from the core region ( kb)
Section 2 Relative EHH Core haplotype frequency G6PD also stands out in comparison to 7 control regions
Section 2 corehap1 corehap4 corehap4 corehap2 corehap5 corehap3 CD40 ligand: long-range haplotype diversity
Section 2 CD40 ligand: homozygosity vs. distance EHH Distance from the core region ( kb)
Section 2 Core haplotype 4 P << 0.0011 Relative EHH Core haplotype frequency CD40 ligand: computer simulation vs. data
Section 2 CD40 ligand: P-values from simulation value P- Distance from the core region ( kb)
Section 2 Relative EHH Core haplotype frequency CD40 ligand also stands out in comparison to 7 control regions
Section 2 Malaria resistance arose in last 10,000 years in Africa Long-range linkage disequilibrium also gives a direct estimate of the date ~2,500 years ago for G6PD ~6,500 years ago for CD40 ligand
Section 2 Traditional tests fail to detect the effect Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio Not significant in our data. This test is a powerful way to detect selection in last 10,000 years
Section 2 1 2 3 4 Conclusions: Powerful general approach for detecting selection
Section 2 Conclusions: Powerful general approach for detecting selection 1 2 3 4 5
Section 2 Conclusions: Powerful general approach for detecting selection 1 2 3 4 Screen the genome for Postive Selection
Section 2 Conclusions: Genome-wide screen for natural selection • We can find disease genes without patients!
Section 2 What’s coming… • Generalization of the long-range haplotype test • Application of the approach genome-wide • Haplotype map data set • Disease gene screen data sets
Acknowledgements for Section 2 Pardis C. SabetiJohn HigginsHaninah Z.P. LevineDaniel J. RichterStephen F. SchaffnerStacey GabrielJill V. PlatkoNicholas J. Patterson Gavin J. McDonaldHans C. AckermanSarah J. CampbellDavid AltshulerRichard CooperRyk WardEric S. Lander
Note The 3rd section of the talk is not included here because it presents data that have not yet been published.