270 likes | 277 Views
Explore the importance of relaxing the last assumption in the Wright-Fisher model - the absence of recombination. Discover how recombination affects genealogical relationships and the mathematical complexity of its analysis.
E N D
Six Assumptions of Wright-Fisher Model • Discrete and non-overlapping generations • Haploid individuals or two subpopulations • The population size is constant • All individuals are equally fit • The population has no geographical or social structure • The genes are not recombining No need to be relaxed Have been relaxed in Chapter 4 To be relaxed soon Comp 790-Coalescent with recombination
No recombination: the last assumption • The last assumption that needs to be relaxed. • Why does it need? • Recombination occurs in most of the real data sets. • Why is it the last one to be relaxed? • More mathematically complex in analysis • The sequence samples are no longer related by a tree, but a graph or a collection of trees. Comp 790-Coalescent with recombination
Outline • What is recombination? • An example of recombination • Hudson’s model of recombination • Wright-Fisher model with recombination • ARG Simulation Algorithm Comp 790-Coalescent with recombination
What is recombination? • Recall the slides in lecture 5. • Recombination • A process in which new gene combinations are introduced • Eg. Crossover, Gene-conversion Comp 790-Coalescent with recombination
No recombination Recombination What is the result of recombination? Grandparents Layer Parents Layer Recombination Children Layer Comp 790-Coalescent with recombination
An example of recombination • The Apolipoprotein E gene • 31 different haplotypes (rows) • 21 segregating sites (columns) • Some pairs of sites cannot be fitted on a single tree. • There must be recombination. Comp 790-Coalescent with recombination
Pair-wise LD measure • LD is a indirect measure of the correlation of genealogical trees for different segregating sites. • The higher LD, the more correlated the pair of sites • The color denotes the significance • There is a weak tendency that highly significant LD is found for close sites. Comp 790-Coalescent with recombination
LD on different distance • LD is smaller the further apart the sites are. • Recombination leads to these pattern. • Sites far apart experience more recombination events. Comp 790-Coalescent with recombination
A summary of the example • We cannot use previous model without recombination to fit these sequences. • Recombination is the cause. • Recombination can generate incompatibilities between pairs of sites. • Segregation sites far apart experience more recombination events, so they become less correlated. Comp 790-Coalescent with recombination
Forward perspective: Parental chromosome is directly inherited from grandparental chromosomes Choose a random point uniformly Copy the genetic material from Chromosome A to the left of that point Copy the genetic material from Chromosome B to the right of that point. Hudson’s model of recombination A B Recombination Comp 790-Coalescent with recombination
Reversed: Choose a chromosome from a parent The chromosome splits to two grandparental chromosomes Hudson’s model of recombination (cont.) Recombination Comp 790-Coalescent with recombination
Modeling recombination and coalescence • Recombination events are the opposite of coalescent events. • Looking backwards • Coalescence is a combining event. • Recombination is a splitting event. • But how can we model both of these events? • Use a similar idea we did before (in adding mutation events to coalescence). • Question 1:What is this idea? Comp 790-Coalescent with recombination
Another exponential distribution • We model the waiting time of recombination events to be an exponential distribution. • This distribution is independent of the coalescent process. • The parameter (or the intensity of recombination) depends on the recombination rate(ρ) in a sequence, times the number of ancestral lineages. Comp 790-Coalescent with recombination
From Hudson’s model to Wright-Fisher model • Hudson’s model simplifies recombination process in terms of the biological facts. • The mechanisms of recombination are very different and complicated in eukaryotes, bacteria, and viruses. • The process is still not very well understood at the molecular level. • But still, it forms the basis for most applications of coalescent theory to recombining sequences. • Now we modify Wright-Fisher model to include this kind of simplified model of recombination. Comp 790-Coalescent with recombination
Wright-Fisher model with recombination • Diploid Wright-Fisher Model • An individual perspective Comp 790-Coalescent with recombination
Wright-Fisher model with recombination (cont.) • Haploid Wright-Fisher Model • We can ignore the existence of individuals under some conditions. • A sequence perspective Comp 790-Coalescent with recombination
Discrete time formulation • In discrete model, let r be the recombination rate. • TRdenotes the number of generations until the first recombination event. • The probability that a sequence was created by recombination in j generation is • TR is geometrically distributed. Comp 790-Coalescent with recombination
Continuous time approximation • Let the scaled recombination rate ρ=4Nr, similar to θ in mutation. J=2Nt • is exponentially distributed. • Note that the probability until now is for only one sequence Comp 790-Coalescent with recombination
Continuous time approximation (cont.) • If there are k sequences, the parameter of the exponential distribution will be kρ/2 • Question 2: Why? • The waiting times for recombination events of every sequences are exponentially distributed ( i.e. Exp(ρ/2) ) and are independent. • The intensity of recombination in any of the k sequences equals the sum of the intensity in each sequence. Comp 790-Coalescent with recombination
Continuous time approximation (cont.) • Again, both coalescence event or recombination event in k sequences are independent and exponentially distributed. • The waiting time of one of these events occurs will be Exp( ) • The probability that the first event is a coalescence is • The probability that it is a recombination is Comp 790-Coalescent with recombination
ARG Simulation algorithm • 1. Start with k = n genes. • 2. For k sequences with ancestral material, draw a random number from the exponential distribution with parameter k(k − 1)/2 + kρ/2. This is the time to the next event. • 3. With probability (k − 1)/(k − 1 + ρ) the event is a coalescence event, otherwise it is a recombination event. • 4. If it is a coalescence event choose two sequences among ancestral sequences at random and merge them into one sequence inheriting the ancestral material to both of the sequences. Decrease k by one. If k = 1 end the process, otherwise go to 1. Comp 790-Coalescent with recombination
ARG Simulation algorithm (cont.) • 5. If it is recombination, draw a random sequence and a random point on the sequence. Create an ancestor sequence with the ancestral material to the left of the chosen point and a second ancestor with the ancestral material to the right of the recombination point. Increase the number of ancestral sequences k by one and go to 1. Question 3: Where can we find the missing material of the ancestors? Splitting A random point Comp 790-Coalescent with recombination
Is the single ancestor ever reached? • A coalescence event decreases k by one. • A recombination event increases k by one. • Question 4: Is there an end for the process? • YES! • Why? • It is a birth-death process. • The coalescent intensity is k(k-1)/2 [birth rate] • The recombination intensity is kρ/2 [death rate] • k(k-1)/2 >= kρ/2 • GMRCA is always found. But it may be a LONG time. Comp 790-Coalescent with recombination
Genealogical structure: From tree to graph • With recombination, we must use a graph to model the sequence relations rather than a tree. • ARG (Ancestral Recombination Graph) • The graph resulting from the algorithm Comp 790-Coalescent with recombination
Genealogical structure:From graph to a collection of trees • However, if we focus on a single point on the sequence, there will be no recombination! • Question 5: Why? • The point of child sequence is always inherited from only one parent sequence. • Local tree • The tree relating the sequences in a single position • The genealogy graph can be seen as a collection of local trees, one for each position. Comp 790-Coalescent with recombination
Next time • More on simulation algorithm • Effect of a single recombination event • Coalescent events with gene conversion Comp 790-Coalescent with recombination