1 / 83

The Coalescent

The Coalescent. Chris Cannings, University of Sheffield, c.cannings@shef.ac.uk http://www.amorph.group.shef.ac.uk/ Limerick, 07/05/2010. The Coalescent. Genetic Drift. Stochastic model of evolution of population of neutral genes. The Coalescent . Reversing time.

vaughn
Download Presentation

The Coalescent

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Coalescent Chris Cannings, University of Sheffield, c.cannings@shef.ac.uk http://www.amorph.group.shef.ac.uk/ Limerick, 07/05/2010

  2. The Coalescent • Genetic Drift. Stochastic model of evolution of population of neutral genes. • The Coalescent. Reversing time. • Tree topology. Catalan numbers. • External Edges. Distribution of lengths. • Stepwise Mutation Model. Distributions and moments. Bell Polynomials. Chris Cannings, U of Sheffield

  3. Genetic Drift n individuals discrete generations Unisexual n individuals Chris Cannings, U of Sheffield

  4. n individuals n individuals Genetic Drift discrete generations Individuals of varying types. Possibly mutation, Recombination, etc. Mutation (offspring type different from parent type) Chris Cannings, U of Sheffield

  5. Wright-Fisher Model • Each individual in the m’th generation is the offspring of an individual in the (m-1)’th generation selected at random (prob=1/n) independently of all others. • Cannings(1974) the n individuals produce X1,X2,………,Xn offspring these being exchangeable r.v.’s. Chris Cannings, U of Sheffield

  6. n individuals n individuals Genetic Drift (multiple types) discrete generations Chris Cannings, U of Sheffield

  7. n individuals n individuals The Coalescent time titime Kingman, 1982 Chris Cannings, U of Sheffield

  8. Coalescent • For a finite population undergoing random reproduction (i.e. not just one offspring for each adult) the current generation will all be descended from a single common ancestor (provided the population has been running for sufficient time) the Most Recent Common Ancestor (MRCA). • Tracing back to that MRCA produces a tree, and in this tree lines coalesce as we run backwards in time. Chris Cannings, U of Sheffield

  9. Coalescent • The coalescent allows one to track the behaviour of a population without the necessity of keeping track of all n individuals through the generations. • Process separated into two pieces (1) produce the coalescent tree (time backwards)(2) run types, with mutation, recombination (time forward). Chris Cannings, U of Sheffield

  10. Coalescent • At each point in time (which is run backwards) keep track only of the individuals who are ancestors of the n individuals at time 0. • Replace discrete generations with continuous time, and assume that any pair of individuals at time t will be the offspring of one individual at t-δt with same prob (λδt) as any other pair, independently • Scale time Chris Cannings, U of Sheffield

  11. The Coalescent • Now suppose that for a population of N individuals we use the Wright-Fisher model. The assumption of this model is that each offspring “selects” its parent at random from the N in the previous generation. • Consider 2 individuals in generation 0 (time will be measured backwards) and their lineages back through time. Chris Cannings, U of Sheffield

  12. The Coalescent • These two lineages will at some time coalesce. When? The probability that they coalesce at time 1 is 1/N, i.e. Prob{Not coalesced by t=1}=(1-1/N). • Each step back is independent soProb{Not coalesced by t}=(1-1/N)t=(1-1/N)Nτ(where τ=t/N) approx=e-τ Chris Cannings, U of Sheffield

  13. The Coalescent • Thus theProb{2 lineages not coalesced by τ}=e-τ approx, which is a negative exponential with rate 1, and the expected time to coalescence is 1 unit of time ( =N generations). Chris Cannings, U of Sheffield

  14. The Coalescent • Now consider k lineages (perhaps corresponding to a sample from the current generation)Prob{none coalesce in the previous gen}=Prob{they have k parents}=(1-1/N)(1-2/N)...(1-(k-1)/N)=i=0Лi=k-1(1-i/N)approx = 1 – kC2/N + O(1/N2). Chris Cannings, U of Sheffield

  15. The Coalescent • Now the probability that more than two coalesce in any generation is negligible. • Let T(k)=(scaled) time to first coalescencethen, as earlier, T(k) is neg. exp. with mean2/(k(k-1)), and these coalescences take place one at a time. Chris Cannings, U of Sheffield

  16. X1 ~ Exp(1) x3 ~Exp(3) x6 ~Exp(6) x10 ~Exp(10) The Coalescent (n=5) Assume no multiple coalescents, only pairwise allowed. Chris Cannings, U of Sheffield

  17. The Coalescent • The times we are interested in will be sums of negative exponentials (see last slide) • There is a general expression for a sum of negative exponential random variables. Chris Cannings, U of Sheffield

  18. for independent ηi ~ Exp(λi) Chris Cannings, U of Sheffield

  19. Tree topology • We need to find the probabilities for the possible tree topologies (shapes) Chris Cannings, U of Sheffield

  20. n=3 • n=3 Only one possible shape. Chris Cannings, U of Sheffield

  21. n=4 2 1 2/3 1/3 Chris Cannings, U of Sheffield

  22. The Coalescent n=5 prob=1/3 prob=1/2 prob=1/6 Chris Cannings, U of Sheffield

  23. Distribution of Tree Topology • Suppose we have a population of size n; what is probability that we have a split at top of (k,n-k)? n k n-k Chris Cannings, U of Sheffield

  24. Distribution of Tree Topology • There will benC2n-1C2n-2C2….2C2 =nC2 n-1C2 ……….3C2 2C2 =n(n-1).(n-1)(n-2)…….3.2.2.1 = n((n-1)!)2/2n-1----------- ------------------ --------------- ----- 2 2 2 2 orders for chosing the pairs, and kC2k-1C2 ….2C2 = k((k-1)!)2/2k-1 within the k and n-kC2 n-k-1C2 ………2C2=(n-k)((n-k)!)2/2(n-k)-1 within the (n-k). Chris Cannings, U of Sheffield

  25. Distribution of Tree Topology Now the k-1 joins in the k set and the (n-k-1) in the (n-k) set can be ordered in n-2Ck-2 ways so the probability of k/(n-k) isnCk n-kCk-2 k((k-1)!)2/2k (n-k)((n-k)!)2/2n-k______________________________________________n((n-1)!)2/2n =1/(n-1) Chris Cannings, U of Sheffield

  26. Distribution of Tree Topology Top-down (for n leaves) First sub-trees have r and (n-r) leaves with probabilities 1/(n-1), then their subtrees split similarly, but you need to allow for the choice of next subtree to split. Chris Cannings, U of Sheffield

  27. Distribution of Tree Topology • n=5 8 12 4 Chris Cannings, U of Sheffield

  28. Catalan Numbers (a brief detour) Number of distinct rooted tree topologies with n leaves cn *4 + *2 1 2 5 42, *8 + 2 +4 14 Chris Cannings, U of Sheffield

  29. Catalan Numbers (a brief detour) Example 5 1,4 3,2 2,3 4,1 Chris Cannings, U of Sheffield

  30. Catalan Numbers • Generating function Chris Cannings, U of Sheffield

  31. Catalan Numbers • Generating functionnow clearly for n>1 Chris Cannings, U of Sheffield

  32. Catalan Numbers • Generating functionnow clearly for n>1so Chris Cannings, U of Sheffield

  33. Catalan Numbers • Generating functionnow clearly for n>1so and so Chris Cannings, U of Sheffield

  34. Catalan Numbers • Generating functionnow clearly for n>1so and so Chris Cannings, U of Sheffield

  35. The Coalescent.MRCA • Thus if we start with k we have T(k), T(k-1) ,..., T(3),T(2) as the times back to the MRCA (Most Recent Common Ancestor). • Take TT(k) = i=2Σi=k T(i), so the expectedtime E(TT(k)) = i=2Σi=k 2/(i(i-1)) = i=2Σi=k 2 (1/(i-1) – 1/i) = 2(1-1/k) approx = 2 for large k Chris Cannings, U of Sheffield

  36. The Coalescent.MRCA • Thus E(TT(2)) = 1 is more than ½ of the expected time for the tree for any k. • Typical tree n=4 E(T2) = 1 E(T3) = 1/3 E(T4) = 1/6 T2 T3 T4 Chris Cannings, U of Sheffield

  37. The Coalescent • VariancesV(Ti)= (2/(i(i-1)))2 , V(T2) = 1, TT(k) = i=2Σi=k T(i), so V(TT(k)) = i=2Σi=k (2/(i(i-1)))2 = i=2Σi=k 22 (1/(i-1) – 1/i)2 = 4 i=2Σi=k (1/(i-1)2 + 1/i2 – 2/(i(i-1))) Chris Cannings, U of Sheffield

  38. The Coalescent • = 4 i=2Σi=k (1/(i-1)2 + 1/i2 – 2/(i(i-1))) = 8 i=1 1Σi=k-1 1/i2 +4/k2 – 4 - 8(1-1/k)from earlier = 8 i=1Σi=k-1 1/i2– (3k+1)(k-1)/k2 Chris Cannings, U of Sheffield

  39. The Coalescent. V(TT(k)) • V(TT(k)) <= 8π2/6 - 12 approx = 1.16so that V(T2) accounts for most of the variance. • The following shows six realisations for a sample of size five. Chris Cannings, U of Sheffield

  40. The Coalescent. Chris Cannings, U of Sheffield

  41. The Coalescent. • Adding recombination to the forward process on the tree is somewhat difficult, and requires MCMC (Monte Carlo Markov Chain) a numerical estimation method. Chris Cannings, U of Sheffield

  42. X1 ~ Exp(1) x3 ~Exp(3) x6 ~Exp(6) x10 ~Exp(10) External Edges Each external edge is the sum of Exponential random variables Chris Cannings, U of Sheffield

  43. X1 ~ Exp(1) x3 ~Exp(3) x6 ~Exp(6) x10 ~Exp(10) External Edges Here a random edge is x10, x10+x6,, or x10+x6+x3 with probabilities 2/5, 2/5 and 1/5 NB. Sums of consecutive xi’s from n to k Chris Cannings, U of Sheffield

  44. for independent ηi ~ Exp(λi) Chris Cannings, U of Sheffield

  45. for λ’s =1,3,6,10 (1*3*6)/(-9*-7*-4)=-1/14 (3*6*10)/(2*5*9)=2 (1*6*10)/(-2*3*7)=-10/7 (1*3*10)/(-5*-3*4)=1/2 NB. Coefficients alternate in sign. Chris Cannings, U of Sheffield

  46. Prob of trees • We can look at probability of the different tree topologies. • Easier to find distribution of edge probabilities. Chris Cannings, U of Sheffield

  47. Probability(edge is Σ n->k ) • Pick one of the n individuals then its external edge is Exp(nC2) with prob=2/n=(n-1) / nC2. • Edge is Exp(nC2)+Exp((n-1)C2) with prob=(1-2/n)*2/(n-1)=(n-2) / nC2. • Edge is Σi=n,kExp(iC2) with prob (k-1)/ nC2 Chris Cannings, U of Sheffield

  48. n=5 Chris Cannings, U of Sheffield

  49. Distributions n=2(1)6 NB. All the coefficients are positive. Chris Cannings, U of Sheffield

  50. Result on sequence of exp • Suppose that we have a set of λi’s and the coefficient of exp(λk) is ak in the distribution of the random variable which is the sum of the exponential random variables. Now add an extra term with exp(µ) then the coefficient of exp(λk) is simply ak x µ/(λk - µ) irrespective of the details of the sequence of λi’s. Chris Cannings, U of Sheffield

More Related