240 likes | 357 Views
Doug Raiford Lesson 10. Phylogeny (part III). Review. Three methods Have looked at two Distance and parsimony Leaves maximum likelihood. Review distance. UPGMA: hierarchical clustering Start by finding closest two Combine closest pair. A. B. C. D. Review parsimony.
E N D
Doug Raiford Lesson 10 Phylogeny (part III) Phylogenetics Part III
Review • Three methods • Have looked at two • Distance and parsimony • Leaves maximum likelihood Phylogenetics Part III
Review distance • UPGMA: hierarchical clustering • Start by finding closest two • Combine closest pair A B C D Phylogenetics Part III
Review parsimony • Looks at each column of an MSA and attempts to find a tree that describes • Builds a consensus tree AGCT AACT AACT AACT A or a G 0 if A 0 if A A A or a G 0 0 if A 1 if A 0 A A A G Phylogenetics Part III
UPGMA and varying rates • Averages distances to combined pair • Doesn’t accurately reflect branch lengths • Can lead to inaccurate trees A B C D Phylogenetics Part III
Other distance methods • Given a tree can calculate path through all branches for any given pair • Know all pair-wise distances (from matrix) • Fitch and Margoliash came up with a way to determine each of the branch lengths (unrooted only) X D XY: A,C,E,G,I A C F E B H G I Y Phylogenetics Part III
Pseudo code • Find two closest taxa • Find average distance from each of these to all other organisms Simultaneous equations Combine closest, and continue A C c a D f d g b B e E Phylogenetics Part III
Maximum likelihood • Similar to parsimony in that performed on each column of MSA • Given the known mutation rates… • All possible trees considered • Examined one column at a time • Probability instead of count • Maximize the probability of a tree match A C c a D f d g b B e E Phylogenetics Part III
Maximum likelihood • Can determine substitution rates from base composition • What substitution rates would result in the base composition staying the same • Overall rate = N/L=rate for single nucleotide • Equilibrium: A↓= C A+G A+T A • N = (A A)fA+(A C)fA+…(T T)fT • Simplifying assumptions • A C = C A • Proportion of each nucleotide stays the same Once again, Simultaneous Equations Phylogenetics Part III
Another view • Better seen in a matrix form • All rates sum to N • Also, A↓= C A+G A+T A Phylogenetics Part III
Given rates… • Tree generated for a column • Each branch represents a substitution • Have a rate • Rate is similar to a probability • In this case rate is not mutations per unit time • Derived from number of mutations divided by number of nucleotides • Can be thought of as “probability of a mutation at any given nucleotide” Rate || Probability Phylogenetics Part III
Probability of a tree • The probability (likelihood) that a tree is the result of the given set of mutations • Product rule: • Multiply the rates of each of the branches A C c a D f d g b B e E Phylogenetics Part III
Combined probability • Generate probability for each column for each tree • Combined probability is the sum of these probabilities atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag Phylogenetics Part III
Adjusting distances • Might not a mutation occur and then revert? • Simple count would not catch • The higher the degree of mutation, the greater the probability • Distances would be slightly greater Must account for reversions Phylogenetics Part III
Jukes and Kantor • Increased distance based upon this probability • K: substitutions per site • p: fraction of nucleotides different between two sequences • .08→.0846 Phylogenetics Part III
Kimura two parameter • Jukes and Kantor assumed a single mutation rate • Transversions less likely than transitions • A and G are purines, T and C are pyrimidines • Mutations that stay within family are more likely • Crossing families called transversion • P: fraction of transitions • Q: fraction of transversions Phylogenetics Part III
When use which? • Page 247 Choose set of related sequences Is their strong sequence similarity? Max Parsimony Obtain MSA Yes Distance Medium similarity (clearly recognizable)? Yes Max Likelihood Phylogenetics Part III
Summary • Branch length related to but not equal to time • MSA’s central due to differing mutation rates • 3 approaches • Distance • Parsimony • Maximum likelihood Choose set of related sequences Is their strong sequence similarity? Max Parsimony Obtain MSA Yes Distance Medium similarity (clearly recognizable)? Yes Max Likelihood Phylogenetics Part III
For non-bioinformaticians • Learned a clustering technique • Hierarchical • Great for exploratory data analysis • Learned some tree growth properties • Exposed to some statistical analysis • Maximum likelihood • Relationship of rates to probabilities Phylogenetics Part III
DNA and proteins, • If do rna must take covariance into account Phylogenetics Part III
Gene duplication events • Paralogs can be exploited • Evolve in a coordinated way • Find an organism that is similar but split off before the gene duplication even took place Phylogenetics Part II
Can convert score to dist • D = -log(S) Phylogenetics Part II
Go back to distance • Did simple count divided by num • Can use alignment score (if normalized for length) • What if mutate and then mutate back • Jukes and Cantor • Kimura two param Phylogenetics Part III