340 likes | 822 Views
Coalescent Theory. Considers the genealogical history of genes in populations Uses DNA sequence data to make inferences about population size, genetic structure, and evolutionary processes. Coalescent Process. t 2. t m is time for coalescence from m to m-1 sequences. t 3.
E N D
Coalescent Theory • Considers the genealogical history of • genes in populations • Uses DNA sequence data to make inferences • about population size, genetic structure, • and evolutionary processes.
Coalescent Process t2 tm is time for coalescence from m to m-1 sequences t3 t4 t5 Gene Tree
How long will the coalescence process take? Simplest case: If pick two random gene copies, probability that the second is the same as the first is 1 / (2Ne). This is the probability that two alleles coalesce in previous generation. It follows that 1 - 1 / (2Ne) is the probability that two sequences were derived from different sequences in the preceding generation. Therefore, the probability that 2 sequences derived from the same ancestor 2 generations ago (grandparent) is 1 - 1 / (2Ne) x 1 / (2Ne). It can be shown that the probability that two sequences were derived from the same ancestor t generations ago is: [1 - 1 / (2Ne)t x (1 / (2Ne)] ~ (1 / (2Ne(-t/2Ne))
[1 - 1 / (2Ne)g-1 x (1 / (2Ne)] Because N is in denominator, the probability will depend on sample size Consider probability of common ancestry for: Generations ago Prob(N=5) Prob(N=10) 1 0.400 0.200 2 0.320 0.182 3 0.256 0.162 It can be shown that the average time back to common ancestry of a pair of genes in a diploid population is 2Ne, and the average time back to common ancestry of all gene copies is 4Ne generations.
Large pop Time back to common ancestor Small pop
Coalescence with no mutation The average degree of relatedness increases with time. All of the gene copies in a population can be traced back to a single ancestral gene. A population will eventually become monomorphic for one allele or another, with this probability determined by initial allele frequencies.
Coalescence with mutation If each lineage experiences m mutations per generation, then the number of base pair differences between them will be #dif = 2mtca. If the average time to coalescence is 2Ne for two randomly chosen gene copies, then #dif = 2 m (2Ne). Therefore, expect the average number of base pair differences between gene copies to be greater in a larger population.
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers # differences and allele frequency ij Freq (x) 1 2 3 Seq 1 G A G G T G C A A C 0.4 Seq 2 G A G G A C C A A C 0.5 Seq 3 G A G C T G G A A G 0.1 1 p12 p13 2 0.2 p23 3 0.3 0.5 P = (0.4)(0.5)(0.2) + (0.4)(0.1)(0.3) + (0.5)(0.1)(0.5) = 0.077 p12 p13 p23
Polymorphism is also estimated by: ATCCGGCTTTCGA K = 3 for-->ATCCGAATTTCGA ATTCGCCTTTCGA K= Number of segregating (variable) sites in a sample of alleles. In Theory: K/a P Coefficient that considers number of sequences examined
Testing DNA Sequences for Neutrality Using the difference in estimates of polymorphism to detect deviation from neutrality. Tajima’ s Test (1989): P- K / a D = V(P- K/a) Normalizing factor Pand K are differentially influenced by the frequency of alleles. Rationale:
P K/a > < Few alleles at intermediate frequency Many low frequency, variable alleles D = 0 neutral prediction D > 0 balancing selection D < 0 directional selection
Gene genealogies under no selection (positive, balancing, or background selection). No Selection : 7 neutral mutations accumulate since the time of the last common ancestor. D = 0
Consider the Effects of Selection on Neutral Sites Linked to a Selected Site Positive Selection : neutral variation at linked sites will be eliminated (swept away) as the advantageous allele quickly is fixed in the population. This process is also called hitch-hiking. Time D < 0
Consider the Effects of Selection on Neutral Sites Linked to a Selected Site Balancing Selection : neutral variation at linked sites accumulates during the long period of time that both allele lineages are maintained. Time D > 0
Consider the Effects of Selection on Neutral Sites Linked to a Selected Site Background Selection : gene lineages become extinct not only by chance, but because of deleterious mutations to which they are linked, which eliminates some gene copies. Time D < 0