1 / 1

Theory of IBD sharing in the Wright-Fisher model

Theory of IBD sharing in the Wright-Fisher model. m. Shai Carmi , Pier Francesco Palamara , Vladimir Vacic , and Itsik Pe’er. Department of Computer Science, Columbia University, New York, NY. Background: Identity-by-descent.

mairi
Download Presentation

Theory of IBD sharing in the Wright-Fisher model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theory of IBD sharing in the Wright-Fisher model m Shai Carmi, Pier Francesco Palamara, Vladimir Vacic, and ItsikPe’er Department of Computer Science, Columbia University, New York, NY Background: Identity-by-descent • How does the amount of sharing depend on the demographic history of the population? • The Wright-Fisher model: • Non-overlapping, discrete generations. • Constant size of N haploid individuals,or,changing size • Ignore recent mutations. • Recombination is a Poisson process. • Each pair of individuals (linages) has probability 1/N to coalesce in the previous generation. • For continuous-time and large population size, approximated by the coalescent. • (Scaled) Time to most recent common ancestor: ( for constant size). B A A • Assume a segment can be detected only if it is longer than m (Morgans). • Denote the fraction of the chromosome shared between two random individuals as the total sharing fT. • Palamara et al. [4]:. • For constant size, • Used to infer population histories. B • In populations that have recently underwent strong genetic drift, most individuals share a very recent common ancestor. • Long haplotypes are frequently shared identical-by-descent (IBD). • Algorithms can detect IBD shared segments between all pairs in large cohorts based on either segment length or frequency [1,2]. • Applications: • Demographic inference • Imputation • Phasing • Association of rare variants/haplotypes • Pedigree reconstruction • Detection of positive selection. • See review [3] A shared segment • Questions: • Distribution? Higher moments? • Differences between individuals? • Applications [imputation by IBD, sharing between siblings] Distribution of the total sharing: renewal theory • Li and Durbin [5] showed that at segment ends: • For a given t, the probability of no recombination at distance ℓ is . Therefore (see also [4]), • For constant size, , . • Find the distribution of ℓTusing renewal theory. Map: • Coordinate on chromosome → time (t) • Shared segments → waiting times between events • L → T, ℓT→ tS • Segment length PDF P(ℓ) → waiting time PDF ψ(τ). • Laplace transform the PDF PT(tS) → Ps(u). ℓT=ℓ1+ℓ5+ℓ9 A The PDF of the number of shared segments (Laplace transformed T → s) B ℓ11 ℓ5 ℓ1 ℓ9 ℓ7 ℓ3 ℓ10 ℓ8 ℓ6 ℓ2 ℓ4 0 L coordinate • In each block, the two chromosomes maintain the same ancestor. • Blocks (segments) end at recombination events. • Define ℓT as the total length of segments having length ≥m. • In the Sequentially Markov Coalescent, fT=ℓT/L. Direct calculation of the variance • A full solution of the variance: (only key equations shown) • . • pnr: probability of no recombination between the two sites in the history of the two chromosomes. • πnr: the probability of the two sites to lie on shared segments, given that there was no recombination (similarly for πr). • For a discrete ancestral process and distance d between the sites: • When there was recombination, calculation of πr is complicated by the fact that the segments are bounded on one end. • Solve by explicit calculations on the coalescent with recombination. • A general equation for the variance of the total sharing fT: • M: number of markers; sum is over all markers • I(s): indicator of a site to lie on a shared segment; with probability π. • π2(s1,s2): probability of both sites s1 and s2 to lie on shared segments. • A simple approximation: • For a constant size population: The cohort-averaged sharing and imputation by IBD • Downstream effect on power of association: • The effective number of sequenced individuals increases with imputation success rate. • Power to detect variant of frequency β appearing in cases only [7]: • Imputation by IBD: • Assume a cohort of size n is genotyped and IBD sharing is detected between all pairs. • A fraction ns/n of the individuals is selected for sequencing. • Non-sequenced individuals are imputed using the sequenced individuals along segments of IBD sharing. • What is the expected imputation success rate when individuals are randomly selected? • What is the success rate when individuals are selected according to their cohort-averaged sharing [6]? • Define pc as the fraction of the genome covered by IBD segments shared with the sequenced individuals. • Define the cohort-averaged sharing: • For each individual: the average sharing to the rest of the cohort. • Approximate the variance: • . • For small n, • For large n, , independent of n. • Distribution is approximately normal. , More applications and conclusions • Summary and discussion: • We obtained analytical results for properties of IBD sharing in the Wright-Fisher model. • Calculated the distribution using renewal theory and the variance using two methods. • Treat genotyping/phasing errors by increasing the length cutoff m. If segments are missed with probability ε, can show that both mean and variance are scaled by (1-ε). • Other analytical approaches and applications to demographic inferences in [4] and talk here. • The sharing per individual (averaged over cohort) exhibits a surprisingly wide distribution even for large cohorts. • Can be taken advantage of in imputation by IBD. • A simple estimator of the population size: • Use , isolate N and simplify: , where is the total sharing averaged over all pairs. • Can be seen that • The variance of the estimator:. • Sharing between siblings: • The variance in sharing between (same parent) chromosomes of siblings is known. • What happens when siblings come from an inbred population and thus share also due to remote ancestry? • The mean sharing is • When calculating variance, decompose sharing into either same-grandparent or remote. References A. Gusev et al., Whole population, genome-wide mapping of hidden relatedness, Genome Res. 19, 318 (2009). B. L. Browning and S. R. Browning, A fast, powerful method for detecting identity-by-descent, AJHG 88, 173 (2011). S. R. Browning and B. L. Browning, Identity by Descent Between Distant Relatives: Detection and Applications, Annu. Rev. Genet. 46, 615 (2012). P. F. Palamara et al. Length Distributions of Identity by Descent Reveal Fine-Scale Demographic History, AJHG (2012). H. Li and R. Durbin, Inference of human population history from individual whole-genome sequences, Nature 449, 851 (2011). A. Gusev et al. Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population, Genetics 190, 679 (2012). Y. Shen et al., Coverage tradeoffs and power estimation in the design of whole-genome sequencing experiments for detecting association, Bioinformatics 27, 1995 (2011). See our paper: S. Carmi, P. F. Palamara, V. Vacic, T. Lencz, A. Darvasi, and I. Pe’er, The variance of identity-by-descent sharing in the Wright-Fisher model, Submitted (2012). arXiv:1206.4745.

More Related