90 likes | 112 Views
UPGMA and FM are distance based methods . UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces the idea that the clades should include the entities that are closest to each other.
E N D
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces the idea that the clades should include the entities that are closest to each other. There is another numerical method called Neighbor Joining (NJ – see p91- 93 of K&R) but it involves a complicated combinatorial method for locating neighbors. The latter two are extensions and modifications of UPGMA. While distance based methods make sense in some contexts, the problem is that they try to reduce the relationship between all entities in a group to a collection of pairwise distances between them. We now look at an entirely different approach called character based methods. We only consider one example.
Method of Maximum Parsimony The idea behind this method is that looking at all possible trees that might relate a collection of sequences, it chooses only those that involve the fewest number of mutations to produce all of the sequences in the group. Let’s start with a simple case. Suppose we have four sequences and in a particular position (we will get to multiple sequences later) we have two sequences that have a T and two that have a C. Consider two possible trees – the numbers indicate from which sequence the nucleotide was taken * * * C C T T C C T T 4 3 1 2 3 4 2 1 Which tree is more likely to explain the evolution of the four sequences? Suppose the ancestral sequence contained a C then the *’s indicate the branches where a mutation occurred. A similar result would occur if the ancestral sequence contained a T.
Still sticking with two nucleotides, C and T, consider the five taxa above. Suppose figures (a) and (b) are from one position in an MSA and (c) and (d) are from another. Parsimony shows that (a) is more likely than (b). However, parsimony does not help us in determining whether (c) is more likely than (d). On the other hand, considering the two positions together, using the parsimony criterion, we would choose the relationships shown in (a) and (c).
* * A T T G A G T A A T Note: The root in the constructions used as illlustrations in these slides is merely for convenience, Parsimony, like FM, produces and unrooted tree and an outgroup is generally used to find a root. Note: the number of *’s (denoting mutations is called the parsimony score of a tree). We do not need to restrict ourselves to only two different nucleotides at a site. Consider five taxa and three nucleotides. * * * Score: 3 Score: 2 Actually, there are several trees for these five taxa with a parsimony score of 2.
Consider the following four short sequences: GTA, GCA, ATC,ACC Suppose the following tree is proposed: Start at the top of this tree * We observed an initial mutation in the middle nucleotide. We placed the * on the branch with the C, assuming that the ancestor had a T in that position. We could have assumed the ancestor had a C in the middle position and put a * on the other branch. The choice does not affect our final Parsimony Score.
Continuing, assuming the previous partial scoring * * * This one is more tricky, because we have changes in two positions, the first and the third. Completing the marking of this tree * * * * This tree has a Parsimony Score of 4. Can you find one with a lower score?
Generally, we deal with much longer sequences. However, not all sites will affect the number of mutations needed for a tree. • It is obvious if all sites have the same nucleotide that this site will not contribute to the score. • Less obvious is when all but one site have the same nucleotide, say A, and the remaining site has some other nucleotide. In this case regardless of the tree topology if we put an A at every interior vertex, we will have the minimum number of mutations. • This leads to the following definition: • A site is called informative if at least two different bases occur at least twice each among the sequences being considered. • We generally choose between alternative trees that may be offered to explain an evolutionary event, by computing the parsimony score only on the basis of informative sites.
Maximum Parsimony is generally of little help in tree construction in that • All possible trees must be considered (albeit, some can be rejected out of hand for a variety of good reasons.) • Many times there are two or more trees that have the same parsimony score. • It makes no use of the Jukes-Cantor or any other model of DNA mutation in its evaluation. Only criterion – simplicity is best. The best explanation of evolutionary history is the one with the fewest mutations. • It is used in conjunction with distance methods when there are relatively few mutations obscuring previous mutations. • Both parsimony methods and distance methods have their strong advocates, and a serious, sometimes acrimonious debate is still going on in the area of evolutionary tree reconstruction.
Read: • Krane and Raymer Chapter 5 pp 98 – 111 • Homework : • Question on slide #6 (Can you take the sequences in the tree having a parsimony score of 4 and find a tree with a lower score.) • K&R p115,116: 5.1, 5.2, 5.3