1 / 12

Continuous Coalescent Model

Continuous Coalescent Model. The continuous coalescent lends itself to generative models Algorithm to construct a plausible genealogy for n genes

lexine
Download Presentation

Continuous Coalescent Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Coalescent Model • The continuous coalescent lends itself to generative models • Algorithm to construct a plausible genealogy for n genes • Note that this model runs backwards, it begins from the current population and posits ancestry, in contrast to a forward algorithm like those used in the first lecture • Start with k = n genes • Simulate the waiting time, , to the next event, • Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformlyamong the pairs • Merge I and J into one gene and decrease the sample sizeby one, k  k -1 • Repeat from step 2 while k > 1 Comp 790– Continuous-Time Coalescence

  2. In Python • A simulator in 12 lines  T = [[i,0.0] for i in xrange(N)] # gene id, time of merge k = N t = 0.0 while k > 1: t += expovariate(0.5*k*(k-1)) i = randint(0,k-1) j = randint(0,k-1) while i == j: j = randint(0,k-1) T[i] = [T[i], T[j], t] T.pop(j) k -= 1 Comp 790– Continuous-Time Coalescence

  3. Properties of a Coalescent Tree • The height, Hn, of the tree is the sum of time epochs, Tj, where there are j = n, n-1, n-2, … , 2, 1 ancestors. As n ∞, E(Hn)  2, and, if n=2, E(H2)=1. Thus, the waiting time for n genes to find their common ancestor is less than twice the time for 2! As n ∞, Var(Hn)  4(π2-9)/3, and, if n=2, Var(H2)=1. Comp 790– Continuous-Time Coalescence

  4. Sampled Distribution • N = 1000000 Comp 790– Continuous-Time Coalescence

  5. Example Trees • Observation: The contribution of T2, where the last two ancestors converge to a common root, is disproportionately large Comp 790– Continuous-Time Coalescence

  6. Total Branch Length • In contrast to Hn, the distribution of the total branch length Ln, has a simple form: • The mean of Ln is found by weighting the coalescent times by the number of active lineages • This sum does not converge for large n, but grows slowly. It fact, it is proportional to log(n) Comp 790– Continuous-Time Coalescence

  7. Shared History • E(Ln) can be used to get a sense of how much history genes share. • Genes would share the least history if they all arose from a common ancestor long ago and then propagated along distinct lineages. • If the mean time to the common ancestor is E(Hn) = 2(1 – 1/n), and we assume the split was a early as possible (thus minimizing the shared history), then the total branch length would be nE(Hn) = 2(n-1). • Comparing to E(Ln) as a fraction of this minimum shared-history case gives: … Comp 790– Continuous-Time Coalescence 7 7 7 7

  8. Plot of Shared History • Even for small n, samples, on average, share considerable history • share(5) = 48% • share(10) = 69% • share(20) = 81% • Sharing is the fractionof a genealogy that anaverage gene shareswith two or more otherextant genes Comp 790– Continuous-Time Coalescence

  9. Variance of Total Branch Length • The variance in the total branch length is:which converges to 2π2/3 ≈ 6.579 as n ∞. • This implies that for large n, Ln is narrowly centered around E(Ln). Likewise, sharing is also relatively consistent. Comp 790– Continuous-Time Coalescence

  10. Implications on Sampling Paths • Sampling multiple paths from extant genes along their ancestors is less effective than one might think. • Most long branches are covered by relatively few samples • Not surprising since the E(H40) = 1.95 and E(H10) = 1.8 (a 4x increase in samples increases height by less than 10%). Comp 790– Continuous-Time Coalescence

  11. Effective Population Size • Real populations are not likely to satisfy the Wright-Fisher model. • In particular, most real populations show some sort of reproductive structure, either due to geography or societal constraints • Also likely that the number of descendents is a generation depends on many factors (health, disease, etc.), as opposed to the implicit Poisson model • Total population size is not fixed, but changes over time Comp 790– Continuous-Time Coalescence

  12. Sanity Check • When the Wright-Fisher model, or the basic coalescent, is used to model a real population, the size of the population (2N) cannot be taken literally. • For example, many human genes have a MRCA less than 200,000 years ago. If we consider one generation per 20 years then N should be less than 200,000/(4*20) = 2500, which is too small (recall the maximum tree height for the entire population is 2. and 2(2 generation_time) = 4*20) Comp 790– Continuous-Time Coalescence

More Related