250 likes | 853 Views
Introduction to Bioinformatics. Substitution Patterns. Substitution Patterns. Substitutions at the level of DNA are accumulated Analyses of number and nature of substitutions are important to Molecular evolution study Recognition of functionally important genes (due to natural selection).
E N D
Introduction to Bioinformatics Substitution Patterns
Substitution Patterns • Substitutions at the level of DNA are accumulated • Analyses of number and nature of substitutions are important to • Molecular evolution study • Recognition of functionally important genes (due to natural selection)
Substitution Patterns within Genes • Mutation and mutation rate • Functional constraint • Synonymous vs. nonsynonymous substitutions • Indels and pseudogenes • Substitutions vs. mutations
Mutation and Mutation Rate • Mutations • Deleterious • Advantageous • Neutral (no effect on fitness of organism) • Well-accepted facts • Advantageous changes are in a substantial minority • Some changes in nucleotide sequences have greater consequences for an organism than do others
Mutation and Mutation Rate • Mutation rates • r = K / (2T), where r : substitution rate K : numbers of substitutions per site T : divergence time • Number of substitutions per site • Number of substitutions two sequences have undergone since they last shared a common ancestor • Can be determined by counting the differences between the two sequences
Functional Constraint • Functional constraint • Especially important portions of genes are under functional constraint • Changes are subject to natural selection • Slow in accumulating changes • Synonymous substitutions • Less subject to correction by natural selection • Relatively quick in substitutions
Functional Constraint Example • Which gene regions have a high substitution rate?
Synonymous vs Nonsynonymous • Discussed in sequence alignments • Not all positions in a codon are as likely to result in nonsynonymous substitutions • Categories of a nucleotide in a codon • Nondegenerate sites • Codon positions where mutations always result in amino acid substitutions (example?) • Twofolds degenerate sites • Two synonymous but other two non (example?) • Fourfold degenerate sites • All four synonymous (example?)
Synonymous vs Nonsynonymous • Which sites have the highest substitution rate?
Indels and Pseudogene • Indels • 10 times less likely to occur than substitutions • Why? • Pseudogene • Nonfunctional and transcriptionally inactive gene • Mammalian pseudogenes tend to accumulate substitutions at a fast rate • Almost 4 per site per 100 millions • Compare to regions of gene?
Substitutions vs. Mutations • Mutations are changes in nucleotide sequences occurred due to mistakes in DNA replication or repair processes • Substitutions are mutations that have passed through the filter of selection on at least some level • Synonymous substitution rates are generally considered to be fairly reflective of the actual mutation rate (why?)
Estimating Substitution Numbers • Is counting differences enough? • Juke-Cantor model • Kimura’s two-parameter model • Models with even more parameters
Jukes-Cantor Model • PC(t) = (1 - 3)PC(t - 1) + (1 - PC(t - 1))
Jukes-Cantor Model • PC(t) = (1 - 3)PC(t - 1) + (1 - PC(t - 1)) • PC(t) = 1/4 + (3/4)e-4t • K = - 3/4 ln [1 - (4/3)p]
Kimura’s Two-Parameter Model • Take transition and transversion intoconsideration
Kimura’s Two-Parameter Model • Transitions occurred at least three times as frequently as and transversions • PCC(t) = (1 - - 2)PCC(t - 1) + PGC(t - 1) + PAC(t - 1) + PTC(t - 1) • PCC(t) = 1/4 + (1/4)e-4t + (1/2)e-2( + )t • K = 1/2 ln[1/(1 - 2P - Q)] + 1/4 ln[1/(1 - 2Q)] • P : fraction of transitions • Q : fraction of transversions • There are other models with even more parameters.
Rate Variations between Genes • Evolutionary rates between genes in a genome also vary! • Examples • Table 3.3 • Histones vs. apolipoproteins • While amino acid substitutions within many genes are generally deleterious, natural selection actually favors variability within populations for some genes • Example: human leukocyte antigen • While host populations are pressured to maintain diverse immune systems, viruses are to evolve rapidly • Example: Nucleotide substitution rate within influenza NS genes is 1.9 x 10-3 per nucleotide per site per year million times greater than that of mammalian genes.
Molecular Clock • (Hypothesis) For a given DNA sequence, mutations accumulate at a constant rate in all evolutionary lineages (suggested by E. Zuckerkandl and L. Pauling, 1965) • The molecular clock may run at different rates in different proteins • But, the number of differences between two homologous proteins appeared to be very well correlated with the amount of time since speciation caused them to diverge independently • Stimulated intense interests in molecular evolution studies • Controversial
Relative Rate Test • Estimate overall substitution rate in different lineages without specific knowledge of divergence time • To determine relative rate for species 1 and 2, • find a less related species 3 as an outgroup • d13 = dA1 + dA3 d23 = dA2 + dA3 d12 = dA1 + dA2 dA1 = (d12 + d13 - d23) / 2 dA2 = (d12 + d23 - d13) / 2 1 dA1 dA3 3 A dA2 2
Molecular Clocks ?? • Substitution rates in rats and mice largely the same • Molecular evolution in humans half as rapid as Old World monkeys • Accumulated substitutions in rodents twice the rate of primates • To date the time of existence of recent common ancestor using molecular clock … • It is necessary to demonstrate the species being examined have a uniform clock • Causes of rate variations in lineages • Differences in generation time • Average repair efficiency • Metabolic rate • Adaptation to new environment
Evolution in Organelles • Mammalian mitochondrial DNA (mtDNA) • ~ 16,000 bps • From Mom, used to track human’s origin • Chloroplast DNA (cpDNA) • 120,000 – 220,000 bps • Single circular chromosome • Protein and RNA encoding genes • Substitution rate in mtDNA is almost 10-fold higher than nuclear DNA • Comparisons of mtDNA often used to study relationships between closely related populations of organisms • Less useful for species that have diverged for more than 10 million years • cpDNA has a slower substitution rate than mtDNA