430 likes | 632 Views
IBD sharing: Theory and applications in the Ashkenazi Jewish population. Shai Carmi Pe’er lab, Columbia University. Mt. Sinai, NY March 2014. About Me. 2006-2008: Empirical network analysis (computational) 2007-2010: Diffusion/navigation in random networks (theory)
E N D
IBD sharing: Theory and applications in the Ashkenazi Jewish population Shai Carmi Pe’er lab, Columbia University Mt. Sinai, NY March 2014
About Me • 2006-2008: Empirical network analysis (computational) • 2007-2010: Diffusion/navigation in random networks (theory) • 2010-2011: Anomalous diffusion (theory) • 2008-2011: RNA splicingand editing (computational/experimental) • 2012-2014: Population genetics, with ItsikPe’er
Outline • IBD Sharing: Introduction • Ashkenazi Jewish Genetics • Demographic inference • Imputation • Future Directions & Summary
Outline • IBD Sharing: Introduction • Ashkenazi Jewish Genetics • Demographic inference • Imputation • Future Directions & Summary
Identical-by-Descent (IBD) Sharing Definition:A segment is shared IBD if it is inherited from a single recent common ancestor. g B A A B A shared segment
What’s “recent”? Definition:A segment is shared IBD if it is inherited from a single recent common ancestor. g • Textbook/Pedigrees:MRCA more recent than a given time (Thompson, Genetics, 2013) • In practice: • A segment is IBD if it is longer than a cutoff • Allow small differences • Present methods can detect segments > ≈1cM B A A B A shared segment
When is the Common Ancestor “recent”? Time(generations) N=10 g=7 Present
Why is IBD Useful? • Segments are rare but long • Probability of a site to be shared • Segment length g B A A B A shared segment
Applications • A segment indicates recent co-ancestry: • Disease mapping • Pedigree reconstruction • Detecting natural selection • Demographic (historical) inference • Identical sequenceacross individuals: • Phasing • Imputation • Estimating heritability • Estimating genotyping error rate g A B A B Browning and Browning, Annu. Rev. Genet., 2012 A shared segment
IBD Sharing Theory • Model: • A population with constant effective size N • A minimal segment length m • Two chromosomes of length L • The fraction of the chromosome in shared segments? • The number of shared segments?
The IBD Process along the Chromosome t8 t3 t7 Given : Coalescent theory: t2 cutoff m t4 t6 t1 t10 t9 t5 ℓ1 ℓ2 ℓ3 ℓ4 ℓ5 ℓ6 ℓ7 ℓ8 ℓ9 ℓ10 0 L Coordinate
Sample Results • The avg. fraction of the chr. in shared segments:; • The avg. number of shared segments: • Implicit expressions for the distributions Palamara et al., AJHG, 2012; Carmi et. al., Genetics, 2013; Carmi and Pe’er, arXiv, 2014
Outline • IBD Sharing: Introduction • Ashkenazi Jewish Genetics • Demographic inference • Imputation • Future Directions & Summary
Founder Populations Time Founder population Non-founder population Disease alleles B Population size
Founder Populations • Recent successes: • Greece (Tachmazidou et al., Nat. Comm. 2013) • Finland (Kurki et al. PLoS Genet., 2014) • Iceland (deCODE) (many papers; most recently Steinthorsdottir et al., Nat. Genet. 2014; Grarup, PLoSGenet., 2013)
A Brief History of Ashkenazi Jews • Unclear origin • Ca. 1000: Small communities in Northern France, Rhineland • Migration east • Expansion • Migration to US and Israel • ≈10M today • Relative isolation
Ashkenazi Jewish (AJ) Genetics AJ Europe Jewish, non-AJ Behar et al., Nature, 2010 Bray et al., PNAS, 2010 Guha et al., Genome Biol, 2012 Behar et al., Hum. Biol., 2014 Price et al., PLoSGenet., 2008 Olshen et al., BMC Genet, 2008 Need et al., Genome Biol, 2009 Kopelman et al., BMC Genet, 2009 Atzmonet al., AJHG, 2010 Middle-East
AJ Genetics: Interim Summary • Current large population (≈10M) • IBD analysis: bottleneck of effective size ≈300 (later) • Mendelian disorders, high frequency risk alleles • Insight on both European & Middle-Eastern past • No genealogies
The Ashkenazi Genome Consortium NY area labs interested in specific diseases Impute Large genotyped cohorts Phase I: 128 whole genomes (CG; completed) Phase II: ≈300 whole genomes (NYGC; under way) Learn about population history Quantify utility in medical genetics
Results Highlights • Low false positive rate at ≈5,000 per genome • 50% more novel variants per genome in AJ (compared to non-Jewish Europeans) • More genetic diversity in AJ (θ), but less projected for large samples • More AJ-specific variants compared to EU-specific variants • A model for EU-Middle-East-AJ ancient history • A model for AJ recent history • The panel is necessary for screening clinical AJ genomes • Catalog of mutations in known AJ disease genes • Slightly higher mutation burden in AJ • The panel is useful for imputation S. C. et al., submitted
Outline • IBD Sharing: Introduction • Ashkenazi Jewish Genetics • Demographic inference • Imputation • Future Directions & Summary
A Simple Approach • Model: • A constant effective population size N • A single chromosome of length L • Sample size n • For each pair, detect all segments of length >m • Compute <fT>, the average fraction of the chr. shared • Inference: • Method of moments • Can prove: Palamara et al., AJHG, 2012; Carmi et. al., Genetics, 2013
A Maximum Likelihood Approach Carmi and Pe’er, arXiv, 2014
A Practical Approach • Assume historical size N(t)=N0 λ(t). • Time scaled by 2N0 • Avg. fraction of the genome in segments of length ℓ1<ℓ<ℓ2: (1) • Method: • Detect IBD in sample • Plot the empirical P(ℓ) • Using Eq. (1), find the history N(t) that fits best P(ℓ) Segment length ℓ Palamara et al., AJHG, 2012
IBD Sharing in AJ • Atzmon et al., AJHG, 2010 • Bray et al., PNAS, 2010 • Gusev et al., MBE, 2012 ≈50cM per pair in segments >3cM
An AJ Bottleneck Time (years) S. C. et al., submitted
Caveats • Phasing and genotyping errors; IBD detection errors • Reasonable power only for 10-50 generations ago • Model specification (e.g. prolonged bottleneck, admixture) • Fitting
Outline • IBD Sharing: Introduction • Ashkenazi Jewish Genetics • Demographic inference • Imputation • Future Directions & Summary
Imputation • Cost-effective association study design: • Fully sequence a small reference panel • Impute many sparsely genotyped individuals Impute2
AJ Panel Performance Fraction of non-ref variants with maf ≤1% wrongly imputed: 13% for AJ, 35% for CEU
Imputation by IBD Sequence A Gusevat al., Genetics, 2012
Imputation by IBD • How to select individuals for sequencing? • Is there enough IBD sharing? • How to impute effectively? Palin et al., Genet. Epidemiol., 2011; Kong et al., Nat. Genet., 2008 Sequence A
Selection for Sequencing • Improve performance by selecting top-sharing samplesGusev et al., Genetics, 2012: INFOSTIP • Theory for coverage in a population modelCarmi et al., Genetics, 2013 • Not terribly important
Coverage by IBD TAGC (sequencing; n=128) SZ study (genotyping; n=2500) Fit to:
Coverage by IBD: Theory • Exact solution: Define and Time (gen) g+1 g B 1-α Present
Outline • IBD Sharing: Introduction • Ashkenazi Jewish Genetics • Demographic inference • Imputation • Future Directions & Summary
Future Directions • N-way IBD sharing • Derived P(ℓ1<ℓ<ℓ2) for three chromosomes • Important for demographic inference, disease mapping, detecting natural selection • Dating mutations using IBD • Phasing/imputation using IBD • A fast approach needed
Summary • IBD is useful in genetics • We characterized IBD in population models • IBD abundant in AJ and can be used for historical inference and imputation • Many interesting future applications
Acknowledgements ItsikPe’er’s lab: James Xue, Ethan Kochav, Yunzhi Ye TAGC consortium members: Todd Lencz, Semanti Mukherjee (LIJMC) Lorraine Clark, Xinmin Liu (CUMC) Gil Atzmon, Harry Ostrer, Danny Ben-Avraham (AECOM) Inga Peter, Judy Cho (MSSM) Joseph Vijai (MSKCC) Ken Hui (Yale) Funding: Human Frontiers Science program Thank you for your attention!