1.09k likes | 1.4k Views
break. Evolutionary rates. Reference: Dan ’ s book chapter 4. Evolutionary rates - history. The first to suggest using DNA and proteins to investigate evolutionary history. (They discussed molecular evolution before the genetic code was established). Linus Pauling (1901-1994).
E N D
Evolutionary rates Reference: Dan’s book chapter 4
Evolutionary rates - history • The first to suggest using DNA and proteins to investigate evolutionary history. • (They discussed molecular evolution before the genetic code was established).
Linus Pauling (1901-1994) • The only person ever to receive two unshared Nobel Prizes—for Chemistry (1954) and for Peace (1962). • His introductory textbook General Chemistry, revised three times since its first printing in 1947 and translated into 13 languages, has been used by generations of undergraduates.
Linus Pauling (1901-1994) • Also wrote popular science books, e.g., “How to Live Longer and Feel Better”, and “Vitamin C and the Common Cold”. • Published over 1,000 articles and books. • Used to protest against nuclear testing.
Linus Pauling (1901-1994) • He received a Ph.D. in chemistry and mathematical physics from California Institute of Technology (Caltech) in 1925 (age 24).
Evolutionary rates Rate is distance divided by time. Distance is number of substitutions per site. Time is in years. The time must be doubled, because the sequences evolved independently. d
Evolutionary rates This formula is not accurate for closely related taxa, in which polymorphism must be taken into account (Takahata and Satta 1997).
Mean Rate of Nucleotide Substitutions in Mammalian Genomes ~10-9 Substitutions/site/year Evolution is a very slow process at the molecular level (“Nothing happens…”)
Sequence alignments Alignment is needed for phylogeny and for molecular evolution. We will assume that the alignment is given. How to construct alignment is outside the scope of this course.
Synonymous vs. nonsynonymous substitutions For most proteins, it is observed that the rate of synonymous substitutions (silent substitutions) is much larger than the nonsynonymous rate (amino-acid modifying substitutions). UUU -> UUC (both encode phenylalanine ): synonymous UUU -> CUU (phenylalanine to leucine): nonsynonymous
A lot A little
Empirical findings: Important proteins evolve slower than unimportantones.
Insulin 1953, Frederick Sanger determines the amino-acid sequence of insulin. This is the FIRST protein whose amino-acid sequence was determined. It demonstrated that insulin is comprised of only L-amino acids.
Insulin Insulin was characterized to be composed of two chains (A&B), linked together by S-S bonds. 21 AA 30 AA
Insulin • How is the 2 chain protein synthesized? • Donald Steiner (University of Chicago) gave the answer. • He studied an islet-cell adenoma of the pancreas, a rare human tumor producing large amounts of insulin.
Adenoma • Adenoma is a benign tumor (not a malignant tumor). Benign in English = harmless • Benign tumor: A tumor that does not recur locally and does not spread to other parts of the body. • Adenoma is from a glandular (i.e., from a gland) origin. • Adenomas can grow from many organs including the colon, adrenal, pituitary, thyroid.
Insulin • He sliced the pancreatic tumor and incubated it with tritiated leucine and then analyzed it. • He found a new protein that was later proven to be the biosynthetic precursor of insulin, the proinsulin.
Insulin • Proinsulin has 30 residues that are absent from insulin.
Insulin • There is even a former form of proinsulin, called preproinsulin. It contains additional 19 AA at the N-terminus. This 19 AA hydrophobic stretch directs the preproinsulin to the ER. • Preproinsulin -> Proinsulin (ER membrane) • From the ER it moves on to the Golgi and then to secretory granules. • Proinsulin -> Insulin (Granules)
Alignment of preproinsulin Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : **.*: *:..* :. *:**** Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos CGSHLVEALYLVCGERGFFYTPKARREVEG ***************:******* :*::* Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ********* Xenopus EQCCHSTCSLFQLENYCN Bos EQCCASVCSLYQLENYCN *****.***:*******
Empirical findings: Functional regions evolve slower than nonfunctionalregions.
Clotting – The end reaction thrombin fibrinogen fibrin
Synonymous vs. nonsynonymous substitutions Histone H4 between human and wheat: excess of synonymous substitutions
Mean nonsynonymous rate 0.74 0.67 (10-9 substitutions per site per year) Mean synonymous rate 3.51 1.01 (10-9 substitutions per site per year)
The coefficient of variation is an attribute of a distribution: its standard deviation divided by its mean Coefficient of variation of nonsynonymous rate 91% Coefficient of variation of synonymous rate 29%
Transition vs. transversion rates Ratio 1.5 4.4 1.1 Degeneracy class 4 2 0
Computing synonymous and non-synonymous rates Silent and non-silent…
Ka/Ks • Our goal is to be able to compare two (or later, more) sequences and to compare the rate of neutral evolution (determined by the synonymous rate) with than of the non-synonymous rate. • The lower the ratio of non-synonynous substitutions to synonymous ones, the higher the intensity of the purifying selection.
Computing synonymous and non-synonymous rates p-distance of synonymous subs. = 3/6 p-distance of nonsynonymous subs. = 3/6 3 3 Problematic: p-distance does not correct for multiple substitutions… Solution: compute the JC correction to the p-distance.
Computing synonymous and non-synonymous rates Assume a protein without selection (evolving neutrally). CAA (Gln) GAA (Glu) TAA (Stop) AAC (Asn) ACA (Thr) AAG (Lys) AAA (Lys) AGA (Arg) AAT (Asn) ATA (Ile) The random chance of a synonymous substitution is much smaller than the chance of a nonsynonymous one.
Computing synonymous and non-synonymous rates Assume a protein without selection (evolving neutrally). ACA (Thr) CCA (Pro) TCA (Ser) GCC (Ala) GAA (Glu) GCG (Ala) GCA (Ala) GGA (Gly) GCT (Ala) GTA (Val) This is also different for different codons.
Computing synonymous and non-synonymous rates So when one “observe” 6 times more nonsynonymous substitutions than synonymous ones – does it indicate that the protein is under purifying selection??? We must normalize for the potentials for silent vs. non-silent mutations of the codons in question.
Nei & Gojobori (1986)method Masatoshi Nei Takashi Gojobori
Counting synonymous sites Consider a particular position in a codon (j=1,2,3). Let fj be the fraction of synonymous changes at this site.