180 likes | 629 Views
Introduction to Bioinformatics. Phylogenetics Part II Distance-Based Methods. Distance Matrix . (Evolutionary) Distance Many possible measures Fraction of sites that differ between two sequences
E N D
Introduction to Bioinformatics Phylogenetics Part II Distance-Based Methods
Distance Matrix • (Evolutionary) Distance • Many possible measures • Fraction of sites that differ between two sequences • # of changes needed to convert one sequence to another (count mismatches, substitution models, …) • Distance Matrix • Matrix of pairwise distances between all sequences • Used to generate tree • Varies with construction method, distance metric
Distance in Phylogenetic Tree • Distances are ultrametric if • Same rate of change on all branches in tree (rare in practice) • All leaves equidistant from root • Also known as a “molecular clock” • Distance matrix must satisfy the following 3-point condition • For any three leaves i, j, k, distances dij, dik, djk • two of three distances are equal and ≥ third
Distance in Phylogenetic Tree • Distances are additive if • Distance between any two leaves i & j on tree = sum of lengths of edges connecting i & j • Distance matrix must satisfy the following 4-point condition • For any four leaves i, j, k, m, two of the distances dij+dkm, dik+djm, dim+djk are equal and greater than the third • In fact, the difference is 2 x the length of the “bridge” edge(s)
UPGMA • UPGMA (Unweighted Pair Group Method using Arithmetic Averages) [Sokal & Michener 1958] • Algorithm 1. Find pair of sequences (or clusters) A, B with smallest distance dAB 2. Insert join for A, B at tree height = ½ dAB. A and B thus form a new cluster. 3. Update distance of any other sequence/cluster X to new cluster as ½ (dAX + dBX) * 4. Repeat until all sequences / clusters joined 5. Produces rooted tree • Assumptions • Distances for tree are ultrametric • Branch lengths for 2 leaves same after join • Distances for tree are additive *: similar algorithms vary at this step
UPGMA Example • Given sequences • Build distance matrix
UPGMA Example • Form clusters • Next step?
Transformed Distance Method • Weakness of UPGMA • Assume constant evolution rate across lineage • Example: Consider sequences A, B, C, and D is Figure 4.5. UPGMA cluster A and C first. • Transformed Distance Method [J. Farris, 1977] • Take advantage of the power of an outgroup • Similar to UPGMA except for the distance matrix • Algorithm • Select an outgroup D • Transformed distance between i and j: dij’ = (dij – diD – djD)/2 + (dkD)/n where n is #ingroups • Run UPGMA with matrix of dij’
Transformed Distance Method • Example • Select D as the outgroup • Calculate transformed distance (dkD)/n = (dAD + dBD + dCD)/3 = (12 + 15 + 10)/3 = 37/3 dAB’ = (dAB – dAD – dBD)/2 + 37/3 = (9 – 12 – 15)/2 + 37/3 = 10/3 dAC’ = (dAC – dAD – dCD)/2 + 37/3 = (8 – 12 – 10)/2 + 37/3 = 16/3 dBC’ = (dBC – dBD – dCD)/2 + 37/3 = (11 – 15 – 10)/2 + 37/3 = 16/3 • Construct new distance matrix • Run UPGMA
Transformed Distance Method • Example (cont’d) • How do you compute the length of a lineage?
Neighbor-Joining Method • Goal • Join closest neighbors (nodes w / same parent) in tree • Avoids problem with UPGMA when rates of change differ • Example • Closest leaves not neighbors in correct tree, but joined first by UPGMA (see previous example) • Assumptions • Rate of change can differ • Branch lengths may differ after join • Branch lengths for tree are additive
Neighbor-Joining Method • Approach • To find closest pair of neighbors • Reduce branch length for a node by (approximately) the average distance of the node from all other nodes • Find smallest distance between nodes (after reduction) • Definitions For all pairs of nodes A & B in set of all nodes L, let • dA,B = distance between A,B • RX = dX,N where N L (total distance from X to all N) • rX = RX / (n – 2),where n = # of nodes • (normalized divergence from X to all other nodes) • QA,B = (n – 2) dA,B – (RA + RB) (rate-corrected distance) • Key property - 2 nodes w/ minimum Q are always neighbors!
Neighbor-Joining Method • Algorithm [Saitou & Nei 1987, Studier & Keppler 1988] 1. Begin with star tree & all sequences as nodes in L 2. Find pair of nodes A & B L with minimum QA,B 3. Create & insert new join (node K) w/ branch lengths • dA,K = ½ (dA,B + rA – rB) • dB,K = ½ (dA,B + rB – rA) 4. For remaining nodes C L, update distance to K as • dK,C = ½ (dA,C + dB,C – dA,B) 5. Insert K and remove A, B from L 6. Repeat steps 2-5 until only two nodes left K A B
Neighbor-Joining Method • Example