450 likes | 674 Views
Disease, natural selection and the 1000 Genomes Project. Elinor K. Karlsson Sabeti lab @ Harvard University and Broad Institute. Human migration & recent evolution. Most diversity. “Out of Africa” ~50,000 years ago. 1000 Genomes populations. 5 European populations. 5 East Asian
E N D
Disease, natural selection and the 1000 Genomes Project Elinor K. Karlsson Sabeti lab @ Harvard University and Broad Institute
Human migration & recent evolution Most diversity “Out of Africa” ~50,000 years ago
1000 Genomes populations 5 European populations 5 East Asian populations 5 Southeast Asian pops Most diversity “Out of Africa” ~50,000 years ago 5 African populations
Genomic Signals of Natural Selection after before prevalence generations
Genomic Signals of Natural Selection after before prevalence generations* * in humans, tests sensitive to events within the last ~50,000 years
Broken Haplotype Long Correlations Genomic Signals of Natural Selection after before Tests: • 1) Long-range correlations • iHS, XP-EHH Long:
Derived Allele Frequency Derived Allele Frequency Position Position Genomic Signals of Natural Selection after before Tests: • 1) Long-range correlations • iHS, XP-EHH 2) High frequency derived Ancestor Chimp Human Gorilla Derived:
Genomic Signals of Natural Selection after before Tests: • 1) Long-range correlations • iHS, XP-EHH 2) High frequency derived 3) High differentiation • FST Differentiated Differentiation Position Differentiation Position
Position Position Localize: Composite of Multiple Signals 1. Long-range correlation 2. High frequency of newer allele G C H 3. Differentiation Position
Simulations: CMS narrows signal of selection • 1Mb to 104kb (dense genotype data) • 1Mb to 89 kb (full sequence data) • 500-1500 variants to 100 • Causal variant among top 20 variants in 50% of tests • Causal variant was top variant in 25% of tests CMS on real data: 185 selected regions from human haplotype map ...
1000 Genomes Project = perfect for selection tests • Positive selection tests detect common variants (>20% ) • 1000 Genomes Project – all variants over 1-5% frequency in population • Find candidate variants • Test function
1000 Genomes Project European East Asian Yoruban 412 candidate regions: 35 nonsynonymous SNPs 147 with single gene 88 with multiple genes 48 lincRNAs 56 eQTLs
Non-synonymous mutation in TLR5 (Yorubans) TLR5: sensing & clearence of bacterial pathogens dimerization and activation domain
Selection + Association Strength of selection Trait association 1000 genomes data GWAS data Selection signal Population Association Signal Affected vs. unaffected
Pathogens = recent, strong selective pressure Global migration and agricultural revolution ... • new pathogen environments • increased population density • closer contact with animal disease vectors • neolithic demographic transition (~12,000 yra)
Pathogens = recent, strong selective pressure Many pathogen receptors / modifiers in selected regions RHOA and OTUB1: Yersinia pestis DAG1: Mycobacterium leprae TLR1: H. pylori, M. leprae and others TLR5: Salmonella typhimurium and others LARGE: Lassa virus DARC: Plasmodium vivax malaria PVRL4: measles virus VDR: Mycobacterium tuberculosis APOL1: Trypanosoma brucei
Few GWAS of pathogen susceptibility 3% pathogen related NHGRI GWAS Catalogue
Project 1: Lassa fever suceptibility in West Africa Lassa virus causes hemorrhagic fever which kills >20,000 people each year Endemic in West Africa: Nigeria, Mano River Union Persistently infected rodent reservoir M. natalensis
Signal of selection at LARGE in Yoruba population Function of LARGE connected to Lassa fever: Sabeti PC et al, Nature (2007)
1000 Genomes Project West Africa populations Mende (Sierra Leone) Esan & Yoruban (Nigeria)
Distinct populations genetically Figure not shown
Selection signal at LARGE in all three populations Figure not shown Yoruba (Nigeria) Figure not shown Esan (Nigeria) Figure not shown Mende (Sierra Leone)
Association signal at LARGE in Mende and Esan Figure not shown
African populations have low LD and poor tagging Illumina 2.5M Array % captured (r2 > 0.8)
Impute with 1000 Genomes data Imputation with 1000G Association analysis
Imputation boosts association signal Figure not shown
Association signal overlaps with selection signal Association in Mende (Sierra Leone) Figure not shown Association in Esan (Nigeria) Figure not shown Signal of selection in Yoruba (Nigeria) Figure not shown
Next: more data • Selection scan with new 1000 Genomes data • bigger GWAS with imputation • Combine selection and association genomewide
Project 2: Cholera in Bangladesh • Ancient disease • 5th century BC • Common disease • 50% exposed by age 15 • High fatality • historically up to 70% • higher in children • Risk is heritable • 1 degree relatives have • ~ 3x higher susceptibility
CMS: ~300 regions of natural selection CMS (selection) position in genome (by chromosome) * FPR = 0.1%
INRICH enrichment analysis of gene sets Genes ~ IKBKG (p=5x10-5) Potassium ion transport (p=2x10-3) Kell blood group genes (4x10-3) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 http://atgu.mgh.harvard.edu/inrich/ ( Lee et al 2012)
Top region of selection = top region of association CMS (selection) -log10p (association*) * Test = PLINK DFAM (combines data from discordant sibship, parent-offspring trios and unrelated case/controls)
Next: More data • 1. Natural selection scan: • NF-κB pathway & inflammasome enriched for selection in Bangladesh • Next: selection scan with 1000 Genomes data • 2. Cholera susceptibility association study • Strongest selected locus associated with cholera susceptibility • Next: GWAS with imputation • 3. Experimental follow-up • Cholera toxin stimulates inflammasome • Next: RNAi / RNA-Seq
Same approach can be applied to other diseases Malaria, dengue fever, leprosy, tuberculosis ... • Strong (dead = no children) • Recent (many new pathogens in last 50,000 years) • Diverse (varies by population) Other diseases? Autoimmune disorders?
Acknowledgements SabetiLab PardisSabeti ShervinTabrizi IlyaShlyakhter Shari Grossman Danny Park Sameer Gupta Yana Kamberov Kristian Andersen Rachel Sealfon Stephen F. Schaffner RidhiTariyal Matt Stremlau Stephen Gire Christian Matranga Sarah Winnicki And everyone else! MGH Regina LaRocque Jason Harris Crystal Ellis Christine Becker Ed Ryan Lynda Stuart Steve Calderwood Sarah Shin Broad Institute Colm O'Dushlaine Phil Hyoun Lee (MGH) Shaun Purcell (MGH) Nick Patterson Yves Boie Andrew Crenshaw Scott Mahan Shannon Power Genomics Platform Sierra Leone Richard Fonnie Augustine Goba Donald Grant SimbirieJalloh LansanaKanneh Danielle Levy Bangladesh Firdausi Qadri AtiqurRahman Nigeria Christian Happi WunmiOmoniwa Philomena Egiaghe OdiaIkponmwosa Tulane University Robert Garry John Schieffelin