Chapter 5 Phylogeny inference based on distance methods

Chapter 5 Phylogeny inference based on distance methods In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees from sequence data.Evolutionary model: When all the pairwise distances have been computed for a set of sequences, a tree topology can then be inferred by a variety of methods.

5.2 The main distance-based tree-building methods are cluster analysis and minimum evolution.Ultrametricity is satisfied dAC≦max(dAB, dBC)unweighted-pair group method with arithmetic means (UPGMA)that are most similar to each other (that is for which the genetic distance is the smallest). When two OTUs are group, they are treated as a new single OTU. From the new group of OTUs, the pair for which the similarity is highest is again identified, and so on, until only two OTUs are left.

UPGMA are extremely sensitive to unequal rates in different lineages.

5.2.2 Minimum evolution and neighbor-joining Additive distances satisfy the following condition, known as the four-point metric condition(dAB+dCD) ≦ max (dAC + dBD, dAD+dBC)Only additive distances can be fitted precisely into an unrooted tree such that the genetic distance between a pair of OTUs equals the sum of the lengths of the branches connecting them, rather than an average,

Minimum evolution (ME) was first described by Kidd and Sgaramella-Zonta (1971); Rzhetsky and Nei(1992) described a method with only a minor difference. In ME, the tree that minimizes the lengths of the tree, which is the sum of the lengths of the branches, is regarded as the best estimate of the phylogeny: n is the number of taxa in the tree and viis the ith branch (remember that there are 2n-3 branches in an unrooted tree of n taxa).

Distance methods • Distances are rarely, exactly tree metrics, and hence one class of ‘goodness of fit’ methods seeks the metric tree that best accounts for the ‘observed’ distances. • The goodness of fit F between observed distance dij and tree distances pij for each pair of sequences i and j is given by. • In the example just given we were fitting an additive tree with (2n-3) branches to （）= n (n-1)/2 pairwise distances. n 2

Minimum evolution • Given an unrooted metric tree for n sequences there are (2n-3) branches, each with length ei. The sum of these branch lengths is the lengthL of the tree: The minimum evolution tree (ME) is the tree which minimizes L. • More commonly, the branch lengths of the minimum evolution tree are estimated using least-squares methods. The branch lengths are estimated in the same way as for goodness of fit measures; however, rather than compare the fit of the observed distances the least squares branch lengths are added together to give the length of the tree.

A drawback of the ME method is that, in principle, all different tree topologies have to be investigated to find the minimum tree. However, this is impossible in practice because of the explosive increase in the number of tree topologies as the number of OTUs increases; an exhaustive search can no longer be applied when more than ten sequences are being used. Page 19

A good heuristic method for estimating the ME tree is the neighbor-joining (NJ) method, developed by Saitou and Nei (1987) and modified by Studier and Keppler (1988). Because NJ is conceptually related to clustering, but without assuming a clock-like behavior, it combines computational speed with uniqueness of results.

However, NJ trees have proven to be the same or similar to the ME tree. Several methods have been proposed to find ME trees, staring from an NJ tree but evaluating alternative topologies close to the NJ tree by conducting local rearrangements.

3 - (21+24) / 3

(21-24) /2 * 3 3/2 U

5 19+17

SDW=dDE/2+(rD-rE)/2(N-2) SEW=dDE-SDW

However, NJ trees have proven to be the same or similar to the ME tree. Several methods have been proposed to find ME trees, staring from an NJ tree but evaluating alternative topologies close to the NJ tree by conducting local rearrangements.

Alternative versions of the NJ algorithm have been proposed, including BIONJ (Gascuel, 1997), weighted neighbor-joining (weighbor), and generalized neighbor-joining. BIONJ and weighbor both consider that long genetic distances present a higher variance than short ones when distances from a newly defined node to all other nodes are estimated.

The weighted neighbor-joining method of Bruno et al. (2000) uses a likelihood-based criterion rather than the ME criterion of Saitou and Nei (1987) to decide which pair of OTUs should be joined. The generalized neighbor-joining method of Pearson et al.(1999) keeps track of multiple, partial, and potentially good solutions during its execution, thus exploring a greater part of the tree space. As a result, the program is able to discover topologically distinct solutions that are close to the ME tree.

Consensus tree

Compare trees derived from different sequences, or from the same sequence using different methods.

Jackknifing

Chapter 5 Phylogeny inference based on distance methods

Chapter 5 Phylogeny inference based on distance methods

Presentation Transcript

Chapter 3: Methods of Inference

Chapter 5 Methods

Likelihood-Based Phylogeny

Chapter 3: Methods of Inference

Chapter 4 – Distance methods

Phylogenetics - Distance-Based Methods

Chap. 5, 6: Statistical inference methods

Distance-based methods

Distance methods

Distance-Based Genome Rearrangement Phylogeny

Chapter 7: Methods of Inference

Distance Methods

Inference Methods

Distance-based phylogeny estimation

Chapter 5 Methods

Chapter 5 Character–Based Methods of Phylogenetics

Distance Methods