1 / 11

Coalescence with Mutations

Coalescence with Mutations. Towards incorporating greater realism Last time we discussed 2 idealized models Infinite Alleles, Infinite Sites A realistic model would also incorporate Insertions, deletions, inversions Sequences are large, but not infinite

cortina
Download Presentation

Coalescence with Mutations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coalescence with Mutations • Towards incorporating greater realism • Last time we discussed 2 idealized models • Infinite Alleles, Infinite Sites • A realistic model would also incorporate • Insertions, deletions, inversions • Sequences are large, but not infinite • Mutations can occur at the same point but on different lineages Comp 790– Coalescence with Mutations

  2. Finite Sites Model • Jukes-Cantor model- all positions are equally likely to mutate, mutations to any of the 3 other nucleotides are equally probable • Kimura model- accommodates that transitions (A<->T and C<->G)occur more frequently than transversions • Positions evolve independently ACCTGCAT ACGTGCAT ACGTGCTT TCCTGCAT ACGTGCTT ACGTGCTA ACGTGCAA TCCTGCAT ACGTGCTT ACGTGCTA ACGTGCAA TCCTGCAT TCCTGCAT Comp 790– Coalescence with Mutations

  3. Wright-Fisher with Mutations • Each gene passed on to the nextgeneration is subject to mutationwith probability u (probability 1 - uit is copied without modification) • Can accommodate any one of InfiniteAlleles, Infinite Sites, or Finite Sites • Overall structure is the same • Working backwards from the present,the probability that a lineage experiencesthe first mutation j generations in the past is: Population of 8 with 4 alleles; one group of 5, and 3 of 1 Comp 790– Coalescence with Mutations

  4. Return of the Basic Coalescent • Formula is similar in form to the discrete coalescent • TM denotes the number of generations until the first mutation event. It is a geometric variable with parameter u. • If time is measured in units of 2N generations then: • Where θ = 4Nu, the population mutation rate, can be interpreted as the expected number of mutations separating two samples Comp 790– Coalescence with Mutations

  5. Probabilities of Events • If we consider n disjoint lineages then the time to the first mutation along any line the distribution is exponential with parameter • Wait for both mutation and coalescence events, then the parameter is the sum of the two parameters (consequence of min(U,V) ~ Exp(a+b)) • Whether the 1st event is a coalescent of mutation event is determined by a biased Bernoulli trials, with probabilities , of mutation for coalescence, and, Comp 790– Coalescence with Mutations

  6. Simulating Sequence Evolution • Simulating a set of genes with mutations • Put k=n, where n is the sample size • Choose an exponential variable with parameter k(k-1+θ)/2 • With probability (k-1)/(k-1+θ) the event is a coalescent event and probability θ/(k-1+θ) it is a mutation event • If a coalescent event, choose a pair randomly to coalesce, set k  k-1 • If a mutation event, choose a lineage to mutate • Continue until k is one Comp 790– Coalescence with Mutations

  7. Coalescent in Python • Straightforward translation into Python  T = [[i,0.0] for i in xrange(N)] # gene number and time of merge k = N theta = 4.0*N*0.1 t = 0.0 while k > 1: t += expovariate(0.5*k*(k-1+theta)) if (random() < theta/(k-1+theta)): i = randint(0,k-1) T[i] = [T[i], t] else: i = randint(0,k-1) j = randint(0,k-1) while i == j: j = randint(0,k-1) T[i] = [T[i], T[j], t] T.pop(j) k -= 1 Comp 790– Coalescence with Mutations

  8. An Alternate Algorithm • Waiting time until a mutation along a lineage is an exponential distribution with parameter θ/2 (slide 4) • Equivalent to distributing mutations along a t-length path with Poisson distribution having parameter tθ/2 • With mean, tθ/2. • Given the number of mutations on a branch, the times are random • The number and times of mutations on each branch are independent • Thus, coalescence can be computed first, then mutations can be inserted along each branch • The placing mutations on branches is called a “Poisson process” Comp 790– Coalescence with Mutations

  9. Poisson Algorithm • A variant of the continuous-time coalescent with mutations • Simulate the genealogy of N sequences according to a coalescent process with rate (algorithm from lecture 4) • For each branch generate a random number, Mt, using a Poisson distribution with parameter tθ/2, where t is the length of the branch • For each branch, the times of the Mt mutation events are chosen at random Comp 790– Coalescence with Mutations

  10. Benefits of New Algorithm • Mutations can be added onto a genealogy in retrospect • Tweak mutation and coalescent parameters independently • Mutation does not effect the fundamental results of coalescence • Mutations only impact the extant Alleles • How long ago did N haplotypes diverge? • What mutation rate would explain the diversity seen? Comp 790– Coalescence with Mutations

  11. Book • What’s ahead • I will finish chapter 2 next Tuesday • From there on, one of you will be responsible for the a chapter • Each chapter in 2 lectures (pick up the pace a bit), a Thursday followed by a Tuesday • You’ll have a weekend to prepare each lecture • I will do chapter 8 Comp 790– Continuous-Time Coalescence

More Related