20 likes | 166 Views
Theory of IBD sharing in the Wright-Fisher model. m. Shai Carmi , Pier Francesco Palamara , Vladimir Vacic , and Itsik Pe’er. Department of Computer Science, Columbia University, New York, NY. Background: Identity-by-descent.
E N D
Theory of IBD sharing in the Wright-Fisher model m Shai Carmi, Pier Francesco Palamara, Vladimir Vacic, and ItsikPe’er Department of Computer Science, Columbia University, New York, NY Background: Identity-by-descent • How does the amount of sharing depend on the demographic history of the population? • The Wright-Fisher model: • Non-overlapping, discrete generations. • Constant size of N haploid individuals,or,changing size • Ignore recent mutations. • Recombination is a Poisson process. • Each pair of individuals (linages) has probability 1/N to coalesce in the previous generation. • For continuous-time and large population size, approximated by the coalescent. • (Scaled) Time to most recent common ancestor: ( for constant size). B A A • Assume a segment can be detected only if it is longer than m (Morgans). • Denote the fraction of the chromosome shared between two random individuals as the total sharing fT. • Palamara et al. [4]:. • For constant size, • Used to infer population histories. B • In populations that have recently underwent strong genetic drift, most individuals share a very recent common ancestor. • Long haplotypes are frequently shared identical-by-descent (IBD). • Algorithms can detect IBD shared segments between all pairs in large cohorts based on either segment length or frequency [1,2]. • Applications: • Demographic inference • Imputation • Phasing • Association of rare variants/haplotypes • Pedigree reconstruction • Detection of positive selection. • See review [3] A shared segment • Questions: • Distribution? Higher moments? • Differences between individuals? • Applications [imputation by IBD, sharing between siblings] Distribution of the total sharing: renewal theory • Li and Durbin [5] showed that at segment ends: • For a given t, the probability of no recombination at distance ℓ is . Therefore (see also [4]), • For constant size, , . • Find the distribution of ℓTusing renewal theory. Map: • Coordinate on chromosome → time (t) • Shared segments → waiting times between events • L → T, ℓT→ tS • Segment length PDF P(ℓ) → waiting time PDF ψ(τ). • Laplace transform the PDF PT(tS) → Ps(u). ℓT=ℓ1+ℓ5+ℓ9 A The PDF of the number of shared segments (Laplace transformed T → s) B ℓ11 ℓ5 ℓ1 ℓ9 ℓ7 ℓ3 ℓ10 ℓ8 ℓ6 ℓ2 ℓ4 0 L coordinate • In each block, the two chromosomes maintain the same ancestor. • Blocks (segments) end at recombination events. • Define ℓT as the total length of segments having length ≥m. • In the Sequentially Markov Coalescent, fT=ℓT/L. Direct calculation of the variance • A full solution of the variance: (only key equations shown) • . • pnr: probability of no recombination between the two sites in the history of the two chromosomes. • πnr: the probability of the two sites to lie on shared segments, given that there was no recombination (similarly for πr). • For a discrete ancestral process and distance d between the sites: • When there was recombination, calculation of πr is complicated by the fact that the segments are bounded on one end. • Solve by explicit calculations on the coalescent with recombination. • A general equation for the variance of the total sharing fT: • M: number of markers; sum is over all markers • I(s): indicator of a site to lie on a shared segment; with probability π. • π2(s1,s2): probability of both sites s1 and s2 to lie on shared segments. • A simple approximation: • For a constant size population: The cohort-averaged sharing and imputation by IBD • Downstream effect on power of association: • The effective number of sequenced individuals increases with imputation success rate. • Power to detect variant of frequency β appearing in cases only [7]: • Imputation by IBD: • Assume a cohort of size n is genotyped and IBD sharing is detected between all pairs. • A fraction ns/n of the individuals is selected for sequencing. • Non-sequenced individuals are imputed using the sequenced individuals along segments of IBD sharing. • What is the expected imputation success rate when individuals are randomly selected? • What is the success rate when individuals are selected according to their cohort-averaged sharing [6]? • Define pc as the fraction of the genome covered by IBD segments shared with the sequenced individuals. • Define the cohort-averaged sharing: • For each individual: the average sharing to the rest of the cohort. • Approximate the variance: • . • For small n, • For large n, , independent of n. • Distribution is approximately normal. , More applications and conclusions • Summary and discussion: • We obtained analytical results for properties of IBD sharing in the Wright-Fisher model. • Calculated the distribution using renewal theory and the variance using two methods. • Treat genotyping/phasing errors by increasing the length cutoff m. If segments are missed with probability ε, can show that both mean and variance are scaled by (1-ε). • Other analytical approaches and applications to demographic inferences in [4] and talk here. • The sharing per individual (averaged over cohort) exhibits a surprisingly wide distribution even for large cohorts. • Can be taken advantage of in imputation by IBD. • A simple estimator of the population size: • Use , isolate N and simplify: , where is the total sharing averaged over all pairs. • Can be seen that • The variance of the estimator:. • Sharing between siblings: • The variance in sharing between (same parent) chromosomes of siblings is known. • What happens when siblings come from an inbred population and thus share also due to remote ancestry? • The mean sharing is • When calculating variance, decompose sharing into either same-grandparent or remote. References A. Gusev et al., Whole population, genome-wide mapping of hidden relatedness, Genome Res. 19, 318 (2009). B. L. Browning and S. R. Browning, A fast, powerful method for detecting identity-by-descent, AJHG 88, 173 (2011). S. R. Browning and B. L. Browning, Identity by Descent Between Distant Relatives: Detection and Applications, Annu. Rev. Genet. 46, 615 (2012). P. F. Palamara et al. Length Distributions of Identity by Descent Reveal Fine-Scale Demographic History, AJHG (2012). H. Li and R. Durbin, Inference of human population history from individual whole-genome sequences, Nature 449, 851 (2011). A. Gusev et al. Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population, Genetics 190, 679 (2012). Y. Shen et al., Coverage tradeoffs and power estimation in the design of whole-genome sequencing experiments for detecting association, Bioinformatics 27, 1995 (2011). See our paper: S. Carmi, P. F. Palamara, V. Vacic, T. Lencz, A. Darvasi, and I. Pe’er, The variance of identity-by-descent sharing in the Wright-Fisher model, Submitted (2012). arXiv:1206.4745.