1 / 60

Sequence Evolution

Explore mutations, natural selection, and evolutionary analyses to understand how genetic variations shape the survival of mutations over time in populations.

mgadsden
Download Presentation

Sequence Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Evolution What are mutations and what do they tell us about evolution? What is recombination and how can it be factored into evolutionary analyses? Analysing natural selection

  2. Different types of mutation 4 types of mutant Harmful Neutral Conditionally useful Useful Potentially useful mutations may have no immediate value and might only be beneficial under certain circumstances Most non-neutral mutations will be harmful Useful mutations will occur in genomes that contain mutations that are harmful.

  3. The survival of mutations more common Frequency of a mutation in the population less common Time since the mutation arose Consider how we would analyse natural selection in these sequences if we could only sample populations at one or a few time-points

  4. The survival of mutations more common 5 3 4 6 Frequency of a mutation in the population 1 7 8 2 9 less common Time since the mutation arose To make things simple we’ll only consider a situation where 9 sites have varied

  5. The survival of mutations more common 5 3 4 6 Frequency of a mutation in the population 1 7 8 2 9 less common Time since the mutation arose To make things even simpler we’ll pretend that all sequences begin as a sting of A’s AAAAAAAAA

  6. The survival of mutations 0 1 2 more common 5 3 4 6 Frequency of a mutation in the population 1 7 8 2 9 less common Time since the mutation arose AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA If we sampled 5 sequences at time-point 0 we would get this alignment

  7. The survival of mutations 0 1 2 more common 5 3 4 6 Frequency of a mutation in the population 1 7 8 2 9 less common Time since the mutation arose AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA If we sampled again at time-point 1 we would get this

  8. The survival of mutations 0 1 2 more common 5 3 4 6 Frequency of a mutation in the population 1 7 8 2 9 less common Time since the mutation arose AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA If we sampled again at time-point 2 we would get this

  9. The survival of mutations AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA • If we were able to track this process (as is possible for some rapidly evolving viruses like HIV) we could speculate: • Mutations at sites1, 3, 4, 5 and 6 were either neutral or adaptive – i.e. the mutant allele rises in frequency and sometimes becomes fixed.

  10. The survival of mutations AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA • If we were able to track this process (as is possible for some rapidly evolving viruses like HIV) we could speculate: • Mutations at sites1, 3, 4, 5 and 6 were either neutral or adaptive • Mutations at sites 8 and 9 were either neutral or harmful - i.e. the mutant alleles arise but are then lost/decrease in frequency below the detection threshold

  11. The survival of mutations AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA • If we were able to track this process (as is possible for some rapidly evolving viruses like HIV) we could speculate: • Mutations at sites1, 3, 4, 5 and 6 were either neutral or adaptive • Mutations at sites 8 and 9 were either neutral or harmful • Sites 2, 7, 8 and 9 are evolving under negative/purifying selection – i.e. the wild-type allele is the best there is since all observed non-wt alleles only ever occur at low frequencies and/or are lost.

  12. The survival of mutations AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA • If we were able to track this process (as is possible for some rapidly evolving viruses like HIV) we could speculate: • Mutations at sites1, 3, 4, 5 and 6 were either neutral or adaptive • Mutations at sites 8 and 9 were either neutral or harmful • Sites 2, 7, 8 and 9 are evolving under negative/purifying selection • Sites 3,4,5 and 6 are evolving under positive/diversifying selection – i.e. mutant alleles displace wt alleles.

  13. The survival of mutations AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA • If we were able to track this process (as is possible for some rapidly evolving viruses like HIV) we could speculate: • Mutations at sites1, 3, 4, 5 and 6 were either neutral or adaptive • Mutations at sites 8 and 9 were either neutral or harmful • Sites 2, 7, 8 and 9 are evolving under negative/purifying selection • Sites 3,4,5 and 6 are evolving under positive/diversifying selection • Site 1 is evolving either neutrally or under weak positive selection.

  14. The survival of mutations AAAAGAAAG AAGAGAAAA GAGAGGAAA AAAAGAAAA AAGAGGAGA AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA AAAAAAAAA • If we were able to track this process (as is possible for some rapidly evolving viruses like HIV) we could infer: • Mutations at sites1, 3, 4, 5 and 6 were either neutral or adaptive • Mutations at sites 8 and 9 were either neutral or harmful • Sites 2, 7, 8 and 9 are evolving under negative/purifying selection • Sites 3,4,5 and 6 are evolving under positive/diversifying selection • Site 1 is evolving either neutrally or under weak positive selection. Although quite powerful this experimental approach highlights a problem with inferance: It is sometimes wrong

  15. Detecting selection in reality AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA In reality for most organisms often effectively only a single sampling time-point is available –We cannot go back thousands of years and take population samples

  16. Detecting selection in reality AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA • Just from looking at this one time-point we can infer: • Invariant sites like 3, 5, 7, 8 and 9 are evolving under negative selection – i.e. the current alleles at these sites are better than any mutants that might arise. How can this be? Previously we saw that sites 3 and 5 had evolved under positive selection

  17. Detecting selection in reality AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA • Just from looking at this one time-point we can infer: • Invariant sites like 3, 5, 7, 8 and 9 are evolving under negative selection – i.e. the current alleles at these sites are better than any mutants that might arise. Remember that every current wild-type allele that is evolving under negative selection was once a mutant allele that was driven to fixation by positive selection

  18. Detecting selection in reality AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA • Just from looking at this one time-point we can infer: • Invariant sites like 3, 5, 7, 8 and 9 are evolving under negative selection – i.e. the current alleles at these sites are better than any mutants that might subsequently arise. Despite their names negative selection and positive selection are not opposites – they are different aspects of the same process: Natural selection

  19. Detecting selection in reality AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA • Just from looking at this one time-point we can infer: • Invariant sites like 3, 5, 7, 8 and 9 are evolving under negative selection • Sites like 1 with intermediate frequency alleles are evolving under positive selection, neutral genetic drift or balancing selection These are very difficult to distinguish from data taken at a single time-point.

  20. Detecting selection in reality AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA • Just from looking at this one time-point we can infer: • Invariant sites like 3, 5, 7, 8 and 9 are evolving under negative selection • Sites like 1 with intermediate frequency alleles are evolving under positive selection, neutral genetic drift or balancing selection • Sites like 2, 4 and 6 that have low frequency polymorphisms are potentially evolving under weak purifying selection that favors the higher frequency allele , neutral genetic drift or balancing selection Again, these are very difficult to distinguish from data taken at a single time-point.

  21. Detecting selection in reality AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA • Just from looking at this one time-point we can infer: • Invariant sites like 3, 5, 7, 8 and 9 are evolving under negative selection • Sites like 1 with intermediate frequency alleles are evolving under positive selection, neutral genetic drift or balancing selection • Sites like 2, 4 and 6 that have low frequency polymorphisms are potentially evolving under weak purifying selection that favors the higher frequency allele , neutral genetic drift or balancing selection It is, however, possible to reliably detect selection if we look at the alignment as a whole.

  22. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency alleles. AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  23. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency minor alleles. eg in this alignment there is one intermediate frequency minor allele (the “G” at site 1) and three low frequency minor alleles (the “G” at site 2, and the “A”’s at sites 4 and 6). Note here we have taken 0.2 or lower to be low frequency and 0.2-0.5 to be intermediate frequency. AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  24. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency minor alleles. The predominance of sites with low frequency minor alleles relative to sites with intermediate frequency minor alleles in this alignment implies a predominance of negative/purifying selection or the occurrence of a selective sweep that wiped out much of the diversity in this population of sequences AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  25. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency minor alleles. If conversely these was a predominance of sites with intermediate frequency minor alleles it would imply a predominance of neutral genetic drift, balancing selection or positive selection during the evolution of these sequences AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  26. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency minor alleles. The most popular “summary statistic” selection detection methods are: Fu and Li’s F test Tajima's D test McDonald-Kreitman test All are implemented in the program DNASP AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  27. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency minor alleles. The most popular “summary statistic” selection detection methods are: Fu and Li’s F test Tajima's D test McDonald-Kreitman test These tests yield a summary statistic – essentially a number with either a negative or positive value. AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  28. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency minor alleles. The most popular “summary statistic” selection detection methods (all implemented in the program DNASP) are: Fu and Li’s F test Tajima's D test McDonald-Kreitman test A significantly negative score implies purifying/negative selection and a significantly positive score implies positive/diversifying selection. AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  29. Summary statistics of selection Various different methods test whole alignments for the relative ratios of low and intermediate frequency minor alleles. The most popular “summary statistic” selection detection methods are: Fu and Li’s F test Tajima's D test McDonald-Kreitman test Importantly, these tests can be very easily confounded (i.e. messed up) by non-random sampling, population subdivisions, and population growth. AGGGGAAAA GAGGGGAAA AAGAGGAAA GAGGGGAAA AAGGGGAAA

  30. dN/dS based selection analysis It is also possible to detect selection acting on protein coding regions The basis of these tests is that each amino acid is encoded by more than one codon

  31. dN/dS based selection analysis It is also possible to detect selection acting on protein coding regions Some nucleotide substitutions within coding regions will be “silent” in that they will not result in an amino acid change

  32. dN/dS based selection analysis Eg Consider the Leucine encoding codon CUG 6 different codons encode leucine

  33. dN/dS based selection analysis Eg Consider the Leucine encoding codon CUG 1 2 3 AUG met CAG gln CUA leu CUG leu CCG pro GUG val CUC leu UUG leu CGG arg CUU leu 4/9 single nucleotide substitutions within this codon will yield a different codon that still encodes leucine (i.e. the mutations will be silent or synonymous)

  34. dN/dS based selection analysis Eg Consider the Leucine encoding codon CUG Mutations at position 1 2 3 AUG met CAG gln CUA leu CUG leu CCG pro GUG val CUC leu UUG leu CGG arg CUU leu 5/9single nucleotide substitutions within this codonwill result in an amino acid substitution (i.e. the mutations will be non-synonymous)

  35. dN/dS based selection analysis Eg Consider the Leucine encoding codon CUG Mutations at position 1 2 3 AUG met CAG gln CUA leu CUG leu CCG pro GUG val CUC leu UUG leu CGG arg CUU leu For any gene there are ~2-3 times more possible non-synonymous substitutions than there are synonymous substitutions

  36. dN/dS based selection analysis Eg Consider the Leucine encoding codon CUG Mutations at position 1 2 3 AUG met CAG gln CUA leu CUG leu CCG pro GUG val CUC leu UUG leu CGG arg CUU leu To compare rates of synonymous and non-synonymous substitution it is necessary to use normalised rates of each

  37. dN/dS based selection analysis Eg Consider the Leucine encoding codon CUG Mutations at position 1 2 3 AUG met CAG gln CUA leu CUG leu CCG pro GUG val CUC leu UUG leu CGG arg CUU leu dS = normalised synonymous substitution rate = the observed number of synonymous substitutions divided by the expected number

  38. dN/dS based selection analysis Eg Consider the Leucine encoding codon CUG Mutations at position 1 2 3 AUG met CAG gln CUA leu CUG leu CCG pro GUG val CUC leu UUG leu CGG arg CUU leu dN = normalised non-synonymous substitution rate = the observed number of non-synonymous substitutions divided by the expected number

  39. dN/dS based selection analysis To work dN/dS out for a group of sequences we must first draw a tree 1 AUA 2 CUA 3 CUU UUA 4 5 UUG

  40. dN/dS based selection analysis Then we identify the synonymous andnon-synonymous mutations 1 AUA CUA 2 CUA CUA 3 CUU CUG UUA 4 UUG 5 UUG

  41. dN/dS based selection analysis In this example we have 4 synonymous mutations and 1 non-synonymous mutation 1 AUA CUA 2 CUA CUA 3 CUU CUG UUA 4 UUG 5 UUG

  42. dN/dS based selection analysis Given 5 mutations, under neutral conditions, we expect (4/9)*5 = 2.2 to be synonymous and (5/9)*5 = 2.8 to be non synonymous 1 AUA CUA 2 CUA CUA 3 CUU CUG UUA 4 UUG 5 UUG Note the “4/9” and “5/9” numbers given above are specific to the CUG codon – other codons have their own proportions of synonymous and non-synonymous mutations – To simplify things here I’ve just pretended that they all the same as CUG

  43. dN/dS based selection analysis Given 5 mutations, under neutral conditions, we expect (4/9)*5 = 2.2 to be synonymous and (5/9)*5 = 2.8 to be non synonymous 1 AUA CUA 2 CUA CUA 3 CUU CUG UUA 4 UUG 5 UUG e.g. for the isoleucine codon AUA 2/9 of the mutations are synonymous and 7/9 are non-synonymous

  44. dN/dS based selection analysis dS = 4/2.2 = 1.82 dN = 1/2.8 = 0.36 dN/dS = 0.36/1.82 = 0.198 1 AUA (a change from leu to ile) CUA 2 CUA CUA 3 CUU CUG UUA 4 UUG 5 UUG

  45. dN/dS based selection analysis dN/dS = 0.36/1.82 = 0.198 dN-dS = 0.36 – 1.82 = -1.46 Implies strong purifying/negative selection 1 AUA (a change from leu to ile) CUA 2 CUA CUA 3 CUU CUG UUA 4 UUG 5 UUG

  46. dN/dS based selection analysis dN/dS = 0.36/1.82 = 0.198 dN-dS = 0.36 – 1.82 = -1.46 Implies strong purifying/negative selection 1 AUA (a change from leu to ile) CUA 2 CUA CUA 3 CUU CUG UUA 4 UUG 5 UUG dN/dS or dN-dS can be calculated for whole genes or for individual sites within genes

  47. Programs for analysing dN/dS Mega MrBayes CodeML Hyphy DataMonkey

  48. Programs for analysing dN/dS Mega – Maximum likelihood estimation dN/dS and can handle site-by-site estimates CodeML MrBayes Hyphy DataMonkey

  49. Programs for analysing dN/dS Mega – Maximum likelihood estimation dN/dS and can handle site-by-site estimates CodeML – Maximum likelihood estimation dN/dS and can handle site-by-site estimates MrBayes Hyphy DataMonkey

  50. Programs for analysing dN/dS Mega CodeML MrBayes Hyphy DataMonkey Bayesian/ML estimation dN/dS and can handle site-by-site estimates

More Related