1 / 24

Introduction to Bioinformatics

Introduction to Bioinformatics. Substitution Patterns. Substitution Patterns. Substitutions at the level of DNA are accumulated Analyses of number and nature of substitutions are important to Molecular evolution study Recognition of functionally important genes (due to natural selection).

enya
Download Presentation

Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics Substitution Patterns

  2. Substitution Patterns • Substitutions at the level of DNA are accumulated • Analyses of number and nature of substitutions are important to • Molecular evolution study • Recognition of functionally important genes (due to natural selection)

  3. Substitution Patterns within Genes • Mutation and mutation rate • Functional constraint • Synonymous vs. nonsynonymous substitutions • Indels and pseudogenes • Substitutions vs. mutations

  4. Mutation and Mutation Rate • Mutations • Deleterious • Advantageous • Neutral (no effect on fitness of organism) • Well-accepted facts • Advantageous changes are in a substantial minority • Some changes in nucleotide sequences have greater consequences for an organism than do others

  5. Mutation and Mutation Rate • Mutation rates • r = K / (2T), where r : substitution rate K : numbers of substitutions per site T : divergence time • Number of substitutions per site • Number of substitutions two sequences have undergone since they last shared a common ancestor • Can be determined by counting the differences between the two sequences

  6. Functional Constraint • Functional constraint • Especially important portions of genes are under functional constraint • Changes are subject to natural selection • Slow in accumulating changes • Synonymous substitutions • Less subject to correction by natural selection • Relatively quick in substitutions

  7. Functional Constraint Example • Which gene regions have a high substitution rate?

  8. Functional Constraint Example

  9. Synonymous vs Nonsynonymous • Discussed in sequence alignments • Not all positions in a codon are as likely to result in nonsynonymous substitutions • Categories of a nucleotide in a codon • Nondegenerate sites • Codon positions where mutations always result in amino acid substitutions (example?) • Twofolds degenerate sites • Two synonymous but other two non (example?) • Fourfold degenerate sites • All four synonymous (example?)

  10. Synonymous vs Nonsynonymous • Which sites have the highest substitution rate?

  11. Indels and Pseudogene • Indels • 10 times less likely to occur than substitutions • Why? • Pseudogene • Nonfunctional and transcriptionally inactive gene • Mammalian pseudogenes tend to accumulate substitutions at a fast rate • Almost 4 per site per 100 millions • Compare to regions of gene?

  12. Substitutions vs. Mutations • Mutations are changes in nucleotide sequences occurred due to mistakes in DNA replication or repair processes • Substitutions are mutations that have passed through the filter of selection on at least some level • Synonymous substitution rates are generally considered to be fairly reflective of the actual mutation rate (why?)

  13. Substitutions vs. Mutations

  14. Estimating Substitution Numbers • Is counting differences enough? • Juke-Cantor model • Kimura’s two-parameter model • Models with even more parameters

  15. Jukes-Cantor Model • PC(t) = (1 - 3)PC(t - 1) +  (1 - PC(t - 1))

  16. Jukes-Cantor Model • PC(t) = (1 - 3)PC(t - 1) +  (1 - PC(t - 1)) • PC(t) = 1/4 + (3/4)e-4t • K = - 3/4 ln [1 - (4/3)p]

  17. Kimura’s Two-Parameter Model • Take transition and transversion intoconsideration

  18. Kimura’s Two-Parameter Model • Transitions occurred at least three times as frequently as and transversions • PCC(t) = (1 -  - 2)PCC(t - 1) + PGC(t - 1) + PAC(t - 1) + PTC(t - 1) • PCC(t) = 1/4 + (1/4)e-4t + (1/2)e-2( + )t • K = 1/2 ln[1/(1 - 2P - Q)] + 1/4 ln[1/(1 - 2Q)] • P : fraction of transitions • Q : fraction of transversions • There are other models with even more parameters.

  19. Rate Variations between Genes • Evolutionary rates between genes in a genome also vary! • Examples • Table 3.3 • Histones vs. apolipoproteins • While amino acid substitutions within many genes are generally deleterious, natural selection actually favors variability within populations for some genes • Example: human leukocyte antigen • While host populations are pressured to maintain diverse immune systems, viruses are to evolve rapidly • Example: Nucleotide substitution rate within influenza NS genes is 1.9 x 10-3 per nucleotide per site per year  million times greater than that of mammalian genes.

  20. Molecular Clock • (Hypothesis) For a given DNA sequence, mutations accumulate at a constant rate in all evolutionary lineages (suggested by E. Zuckerkandl and L. Pauling, 1965) • The molecular clock may run at different rates in different proteins • But, the number of differences between two homologous proteins appeared to be very well correlated with the amount of time since speciation caused them to diverge independently • Stimulated intense interests in molecular evolution studies • Controversial

  21. Relative Rate Test • Estimate overall substitution rate in different lineages without specific knowledge of divergence time • To determine relative rate for species 1 and 2, • find a less related species 3 as an outgroup • d13 = dA1 + dA3 d23 = dA2 + dA3 d12 = dA1 + dA2 dA1 = (d12 + d13 - d23) / 2 dA2 = (d12 + d23 - d13) / 2 1 dA1 dA3 3 A dA2 2

  22. Molecular Clocks ?? • Substitution rates in rats and mice  largely the same • Molecular evolution in humans  half as rapid as Old World monkeys • Accumulated substitutions in rodents  twice the rate of primates • To date the time of existence of recent common ancestor using molecular clock … • It is necessary to demonstrate the species being examined have a uniform clock • Causes of rate variations in lineages • Differences in generation time • Average repair efficiency • Metabolic rate • Adaptation to new environment

  23. Evolution in Organelles • Mammalian mitochondrial DNA (mtDNA) • ~ 16,000 bps • From Mom, used to track human’s origin • Chloroplast DNA (cpDNA) • 120,000 – 220,000 bps • Single circular chromosome • Protein and RNA encoding genes • Substitution rate in mtDNA is almost 10-fold higher than nuclear DNA • Comparisons of mtDNA often used to study relationships between closely related populations of organisms • Less useful for species that have diverged for more than 10 million years • cpDNA has a slower substitution rate than mtDNA

More Related