150 likes | 302 Views
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002. Outline. Comments about Trees UPGMA (Unweighted Pair Group Method with Arithmetic Mean) analysis Other uses of phylogenetic trees Conversion of Alignment Scores to distances Maximum Likelihood Approach
E N D
Phylogenetic Prediction Lecture IIby Clarke S. ArnoldMarch 19, 2002
Outline • Comments about Trees • UPGMA (Unweighted Pair Group Method with Arithmetic Mean) analysis • Other uses of phylogenetic trees • Conversion of Alignment Scores to distances • Maximum Likelihood Approach • Comments on Neighbor Joining Algorithm • Conclusion
Comments on Trees • Trees give insights into underlying data • Identical trees can appear differently depending upon the method of display • Information maybe lost when creating the tree. The tree is not the underlying data.
UPGMA (Unweighted Pair Group Method with Arithmetic Mean) Algorithm • Create distance matrix • Build internal representation of Tree • Find two closest members • Combine members • Calculate new distances for combined node • Repeat until only one node is left (root node) • Draw Tree • If node Draw Node and Exit • Else Find Short and Tall Trees • Recursive Call DrawTree(Tall_Tree) • Recursive Call DrawTree(Short_Tree) • DrawConnection(Tall_Tree, Short_Tree) • Exit • Calculate loss of information (Cophenetic Correlation Coefficient) • Number between –1 and 1. • 1 Perfect Correlation • 0 No Correlation • -1 Perfect Reverse Correlation
Distance Matrix of 16s rna gene • Global alignments were done between 6 species of bacteria • Sequences were 500 base pair sequences from MIDI LABS. • Mismatches were used as the data points for the distance matrix. • sequences.txt http://genome.cs.mtu.edu/align/align.html alignments.txt
UPGMA Analysis • UPGMA Spread Sheet UPGMAfinal.xls
Other uses of phylogenetic trees • Verification of Taxonomy • Organisms have been classified into various groups before gene sequencing. • Is there a relationship between genetic differences and existing taxonomy? • bacpseu.txt http://clustalw.genome.ad.jp/ CLUSTALW.doc • bacpseustaph.txtTaxonomy.doc • Identification of Unknowns • Unknown is placed in the tree along with known samples • The relationship between the known and unknown sample allows for identification • unknown_id.txtUnknown_Results.doc • Non genetic analysis (Fatty Acids) • FattyAcid_PseuBaci.rtf
Conversion of Alignment Scores to Distances • Alignment scores are large for similar sequences. • Distance methods require that the distances between similar sequences are smaller than the distances between less similar sequences. • Large alignment scores need to be mapped to small distances and vice versa.
Maximum Likelihood Analysis • Same as Maximum Parsimony except rates of nucleic acids substitutions are not considered to have equal probability. • All possible unrooted trees are evaluated. (Same for Parsimony) • Each column of the alignment is processed. (Same for Parsimony) • The transition of A -> T will have a different probability than the transition from G -> C • Start with a frequency distribution table that specifies the probability of one base being substituted for another base. • See probabilities of nucleotide substitution. (Table 6.5 pg 275) • Probability that unrooted tree predicts each column of the alignment is calculated. • Probabilities for each column are summed together for each tree. • The unrooted tree with the highest probability is chosen.
Maximum Likelihood Example • Four sequences are compared (w, x, y and z) • All unrooted trees are shown • In this example we will examine the first unrooted tree.
Maximum Likelihood Example Continued • L(Tree x) = L0 * L1 * L2 * L3 * L4 * L5 * L6 • L0 base probability of nucleotide at 0 (0.25) • L1 probability of nucleotide changing from value at 0 to value at 1. • L2 probability of nucleotide changing from value at 0 to value at 1. • L3 probability of nucleotide changing from value at 1 to value at 3 (T). • L4, L5, L6 probability of nucleotide changing to value at leaf.
Maximum Likelihood Example Continued • There are 64 likelihood trees to evaluate. (number of internal nodes) ^ (number of bases) or 3^4. • We will show evaluation TTG against the first unrooted tree for column TTAG • Determine values for L0, … L6. Values are determined by looking up probabilities in transition probability table. • Probability of L2 is T->G • Probability of L5 is G -> A • Probability of L3 is T->T • Determine combined probability L0 * L1 * L2 * … * L6
Maximum Likelihood Example Continued • Determine probability for combination TGG • Determine probability for the other 62 combinations. • Sum all the trees together. L(Tree) = (LTree1) + L(Tree2) + … + L(Tree64) • Move to next column and repeat the same procedure. • Once all columns are complete sum all the probabilities. This is the likelihood of the first unrooted tree. • Continue this process for the other unrooted trees. • Pick the unrooted tree with the highest probability. This is the most likely unrooted tree.
Comments on Neighbor Joining compared with Fitch • Not nearest neighbor (objective is to create the smallest tree). • Nearest Neighbor is almost identical to Fitch except for the evaluation function. • Start with star • Evaluate all possibilities by combine any two nodes and run Fitch. • Evaluate size of tree by summing lengths of branchesPick smallest and continue. • Evaluation of Fitch is done by calculating the predicted distance between each pair of sequences for each tree to find the tree that best fits the original data. • Question? Is summing the braches faster that calculating predicted distance?
Conclusion • Phylogenetic Prediction can be used for more than Evolutionary Distance • Verification of Taxonomy • Identification of unknown • Techniques work for genetic and non genetic data (Fatty Acid). • Use multiple methods for verification • Pick at least two different types of methods from Parsimony, Distance and Likelihood. • If the analysis is in agreement there is a higher level of confidence that the analysis is correct.