290 likes | 366 Views
Computational Genomics 5a Distance Based Trees Reconstruction (cont.). Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT). Phylogenetic Trees - Methods.
E N D
Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT) .
Phylogenetic Trees - Methods • There are several methods with which we construct trees and estimate how good a tree describes the data (and thus the evolution process) • Distance based methods • Parsimony • character based methods • Likelihood • Whole genome/proteome methods
Additive Distances We say that a distance metric D on L objects is additive if there is an unrooted binary tree on L leaves, with positive edge weights, that realizes the distanceD. Namely for all i,j, D(i,j)=DT(i,j)
Characterizing Additive Distances An additive distance is fully characterized by the four point condition: Any 4 points can be renamed such that
7 C A Trees from Additive Distances: Algorithm • Verify that the distance matrix constitutes an additive metric • Choose a pair of objects, which results in the first path in the tree. • Choose a third object and establish the linear equations to let the object branch off the path. • Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. • 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. • 2. Once the new path branches off an edge in the tree, this insertion is finished.
Trees from Additive Distances: Algorithm • Verify that the distance matrix constitutes an additive metric • Choose a pair of objects, which results in the first path in the tree. • Choose a third object and establish the linear equations to let the object branch off the path. • Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. • 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. • 2. Once the new path branches off an edge in the tree: This insertion is finished. A 1 6 C X 1 B
Trees from Additive Distances: Algorithm • Verify that the distance matrix constitutes an additive metric • Choose a pair of objects, which results in the first path in the tree. • Choose a third object and establish the linear equations to let the object branch off the path. • Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. • 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. • 2. Once the new path branches off an edge in the tree: This insertion is finished. d(A,B)=d(A,X)+d(X,B) d(A,C)=d(A,X)+d(X,C) d(B,C)=d(B,X)+d(X,C)
Trees from Additive Distances: Algorithm • Verify that the distance matrix constitutes an additive metric • Choose a pair of objects, which results in the first path in the tree. • Choose a third object and establish the linear equations to let the object branch off the path. • Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. • 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. • 2. Once the new path branches off an edge in the tree: This insertion is finished. C 5 A 1 1 1 2 B D
Trees from Additive Distances: Algorithm • Verify that the distance matrix constitutes an additive metric • Choose a pair of objects, which results in the first path in the tree. • Choose a third object and establish the linear equations to let the object branch off the path. • Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. • 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. • 2. Once the new path branches off an edge in the tree: This insertion is finished. C 5 A 1 1 E 5 1 2 B NO! D
Trees from Additive Distances: Algorithm • Verify that the distance matrix constitutes an additive metric • Choose a pair of objects, which results in the first path in the tree. • Choose a third object and establish the linear equations to let the object branch off the path. • Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. • 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. • 2. Once the new path branches off an edge in the tree: This insertion is finished. E 3 A 2 1 1 3 C 1 2 B D
Trees from Additive Distances: Algorithm is this necessary? • Verify that the distance matrix constitutes an additive metric • Choose a pair of objects, which results in the first path in the tree. • Choose a third object and establish the linear equations to let the object branch off the path. • Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. • 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. • 2. Once the new path branches off an edge in the tree: This insertion is finished. E 3 A 2 1 1 3 C 1 2 B D
Reconstructing a Tree from an Additive Distance By algorithm, given a distance matrix constituting an additive metric, the topology of the corresponding additive tree is unique. Q.: Given an additive metric on n leaves, what is the run time of the algorithm? A.: Number of phases is n. Work per phase is O(n). So total is O(n2). E 3 A 2 1 1 3 C 1 2 B D
Approximating Additive Metrices In practice, the distance matrix between molecular sequences will not be additive. In such case we want to find a tree T whose distance matrix is “close” to the given one. The methods for exact tree reconstruction provide an inventory for heuristics for tree construction based on approximating additive metrics. Heuristics give exact results when operating on additive metrics, but the performance of solutions gets unclear when non additive metrics are handled.
A B C D Neighbor Finding How can we find from distances alone a pair of sisters (neighboring leaves)? Closest nodes are not necessarily neighboring leaves. Next, we show a way to find neighbors from distances.
Neighbour Joining Algorithm: Outline • Identifya pair of leaves u,v asneighbors. • Combineu,v into a new node, w. • Update the distance matrix: Calculate w’s distance from • any other node x of the tree using • Notice that all 3 quantities on rhs are known. • When only 3 nodes are left – compute 3 distances & finish.
i m 0.1 0.1 0.1 k l 0.4 0.4 j n Neighbour Joining Algorithm • Identify a pair of neighborsi,j among n leaves. • Combine i,j into a new node u. • Update the distance matrix. • When only 3 nodes are left – finish. Let ri be the sum of distances from i to every other node The measure between i and j we use in the algorithm is
i m 0.1 0.1 0.1 k l 0.4 0.4 j n Neighbour Joining Algorithm Let ri be the sum of distances from i to all other nodes The measure between i and j we use in the algorithm is
T1 T2 m l k i j Neighbor Finding: Seitou & Nei method Theorem (Saitou&Nei)Assume D is additive, and all tree edge weights are positive. If XD(i,j) is minimal (among all pairs of leaves), then iandj are sister taxa in the tree. The proof is rather involved, and will be skipped (no tears pls).
m k i j Complexity of Neighbor Joining Algorithm Naive Implementation: Initialization:θ(L2) to compute the XD(i,j)’s. Each Iteration: • O(L) to update {XD(i,k):i L} for the new node k. • O(L2) to find the minimalXD(i,j). Total of O(L3). • This can be improved using better data structures (e.g. heap)
Reconstructing Trees from Additive Matrices • Q: Do we have to test additivity before running NJ? • A: By Seito-Nei, if matrix is additive, NJ will constructthe correct tree. Algorithm does not care about awareness and need not know anything about the matrix! E 3 A 2 1 1 3 C 1 2 B D
U B Running NJ: Example on 4 Leaves A Remark: The XD values imply that the distances are not additive (why?).
U B Updated Distance Matrix,Choosing A,B as Neighbors V D A Notice that now we have only one Choice: The neighbors are U and D.
U B Final Distance Matrix V C D A Remark: Resulting tree is unrooted.
Reconstructing Trees from non Additive Matrices Q: What if the distance matrix is not additive? A: We could still run NJ! Q: But can anything be said about the resulting tree? A: Not really.Resulting tree topology could even vary according to way ties are resolved on the way. Remark: This indeed was the case with last example.
Almost Additive Matrix A distance matrix d’ is “almost additive” if there exists an additive matrix D such that Atteson: If d’ is almost additive with respect to a tree T, then the output of NJ is a tree T’ with the same topology as T
Root Unrooted Tree - NJ
Output - NJ Tree Branch length is proportional to distance