180 likes | 312 Views
The Ashkenazi Genome Project. Shai Carmi Pe’er lab, Columbia University and The Ashkenazi Genome Consortium (TAGC). Personal Genomes & Medical Genomics Cold Spring Harbor, NY November 2012. Recent History of Ashkenazi Jews. Mediterranean origin (?)
E N D
The Ashkenazi Genome Project Shai Carmi Pe’er lab, Columbia University and The Ashkenazi Genome Consortium (TAGC) Personal Genomes & Medical Genomics Cold Spring Harbor, NY November 2012
Recent History of Ashkenazi Jews • Mediterranean origin (?) • Ca. 1000: Small communities in N. France, Rhineland • Migration east • Expansion • ~10M today, mostlyin US and Israel • Relative isolation
Ashkenazi Jewish Genetics • Recently, AJ shown to be a genetically distinct group • Close to Middle-Eastern & South-European populations 300 Jewish individuals; SNP arrays Jewish non-AJ AJ Europeans Middle-Eastern Price et al., PLoS Genetics 2008. Olshen et al., BMC Genetics 2008. Need et al., Genome Biology 2009. Kopelman et al., BMC Genetics, 2009. Atzmonet al., AJHG 2010 Behar et al., Nature 2010. Bray et al., PNAS 2010. Guha et al., Genome Biology 2012.
Recent Demography & IBD In small populations, common ancestors are likely recent. B A
Recent Demography & IBD In small populations, common ancestors are likely recent. • IBD is highly informative on recent history! • IBD common in AJ.(Gusev et al., MBE 2011) B A A B Many long haplotypes identical-by-descent A shared segment
AJ Genetic History UK AJ t 2,300 High potential for genetic studies! Years ago 45,000 270 800 Power of imputation by IBD Present 4,300,000 N Palamara et al., AJHG 2012 Effective size Expansion rate ≈34% per generation
The Ashkenazi Genome Consortium Goal: • Sequence to high coverage hundreds of healthy AJ • Use as a reference panel for association studies, imputation, and clinical interpretation • Understand population history and functional genetic variation in AJ • Phase I: • 58 AJ personal genomes (86 under way) • ~60yo, healthy controls • Unrelated, PCA-validated AJ • Technology: Complete Genomics
Quality Control Ti/Tv
Variant Statistics &Comparison to Europeans (M) 14 Flemish genomes (Belgium) TAGC Similar results in 13 CG European public genomes. (k)
Comparison to Europeans • Allele frequency spectrum: • No excess singletons. • Slight excess of doubletons. • More novel SNPs in AJ (3.8% vs. 3.1%). singletons doubletons
Quality Control (2) • False positive rate assessment by runs of homozygosity: • Assume hetsin high confidence roh are FP. • Genome wide extrapolation: ~20,000 per genome. • QC: • Discard putatively low-quality variants • Discard HWE violations, low call rate FP after QC: ~5,000 per genome. hets Paternal Maternal
Applicability to Clinical Genomics • Variants of unknown significance • Technical false positives • True variants without health impact Novel variants per sample Not in TAGC Not in TAGC
Demographic Inference • Use allele frequency spectrum and coalescent simulations. • Assume the demographic model previously mentioned. 100 10 %sites 1 0.1 • Parameters qualitatively similar to those inferred from IBD • Bottleneck 35gbp of size 500; Pre-bottleneck size 90,000
Summary • IBD reveals AJ population bottleneck and expansion and potential for genetics studies. • High quality genomes sequenced by TAGC indicate utility in clinical setting. • Confirm demography and demonstrate subtle differences from Europeans. • Ongoing analysis: • Imputation power using TAGC vs. 1kG as ref panels • Local ancestry inference • Functional variants; AJ disease genes • Mobile element insertions
Thank you! TAGC consortium members: Columbia University Computer Science: ItsikPe’er, Pier Francesco PalamaraUndergrads:Fillan Grady, Ethan Kochav, James XueIT:ShlomoHershkop Long-Island Jewish Medical Center: Todd Lencz, Semanti Mukherjee, SauravGuha Columbia University Medical Center: Lorraine Clark, Xinmin Liu Albert Einstein College of Medicine: Gil Atzmon, Harry Ostrer Mount Sinai School of Medicine: Inga Peter, Laurie Ozelius Memorial Sloan Kettering Cancer Center: Ken Offit, Vijai Joseph Yale School of Medicine: Judy Cho, Ken Hui, Monica Bowen The Hebrew University of Jerusalem: Ariel Darvasi VIB, Gent, Belgium Herwig Van Marck, StephanePlaisance Complete GenomicsJason Laramie Funding: Human Frontiers Science program.
Formal Inference Using IBD • Assume a population of historical size . • Total shared segments of length : • Detect IBD in sample Infer history . Palamara et al., AJHG 2012 B A A B A shared segment
Data processing • CGA tools VCF generator: called sites only. • Correct multi-nucleotide substitution bug. • Compress, index, and distribute. • Generate high-quality genotypes set for population genetic analyses. • Remove indelsand multi-nucleotide substitutions. • Remove low-quality SNPs. • Remove multi-alleic SNPs. • Remove half-calls. • Remove SNPs with high no-call rate. • Remove SNPs not in Hardy-Weinberg equilibrium. • Remove monomorphic reference SNPs. • Remove an inbred individual. • Format as Plink file.