1 / 29

Building Phylogenies

Building Phylogenies. Distance-Based Methods. Methods. Distance-based Parsimony Maximum likelihood. a 0 b 6 0 c 7 3 0 d 14 10 9 0 a b c d. 0. 1. 2. 3. 4. 5. 6. 7. 8. Distance Matrices. a. b. c. d. Distance matrix is additive if there is a tree that fits it exactly.

bendek
Download Presentation

Building Phylogenies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Phylogenies Distance-Based Methods

  2. Methods • Distance-based • Parsimony • Maximum likelihood

  3. a 0 b 6 0 c 7 3 0 d 14 10 9 0 a b c d 0 1 2 3 4 5 6 7 8 Distance Matrices a b c d Distance matrix is additive if there is a tree that fits it exactly

  4. a 0 b 2 0 c 6 6 0 d 10 10 10 0 a b c d 0 1 2 3 4 5 Ultrametric Matrices a b c d Additive + molecular clock assumption

  5. Methods • Fitch - Margoliash • UPGMA • Neighbor-joining • Many others

  6. Least squares trees • Minimize over all trees • Choice of weights wij : • Uniform:wij 1 • Fitch-Margoliash:wij  1/Dij2 • Others . . .

  7. Sarich's (1969) immunological distances

  8. Least squares tree for Sarich’s data

  9. Clustering Methods • E.g., UPGMA and Neighbor-Joining • A cluster is a set of taxa • Interspecies distances translate into intercluster distances • Clusters are repeatedly merged • “Closest” clusters merged first • Distances are recomputed after merging

  10. UPGMA • Unweighted pair group method using arithmetic averages • The distance between clusters Ci and Cj is • After merging Ci and Cj to create cluster Ck define distance from k to every other cluster r as

  11. UPGMA: Initialization • Assign each sequence i to its own cluster Ci • Define one leaf (tip) of tree for each sequence and place it at height 0

  12. UPGMA: Iteration Repeat until only two clusters remain: • Choose the two clusters i and j with smallest Dij • Create a new cluster k, where Ck = CiCj • Compute Dkr for all r. • Define a new node k with children i and j, and place it at height Dij /2. • Add k to the current clusters and delete i and j Letiandjbe the remaining clusters. Place root at heightDij /2

  13. UPGMA Example

  14. UPGMA tree for Sarich’s data

  15. A pitfall of UPGMA • The algorithm produces an ultrametric tree: the distance from the root to any leaf is the same • UPGMA assumes a constant molecular clock: all species accumulate mutations (evolve) at the same rate.

  16. UPGMA fails when molecular clock assumption doesn’t hold

  17. Neighbor Joining • Saitou and Nei, Molecular Biology and Evolution4 (1987) • Idea: Find a pair of leaves that are close to each other but far from other leaves • Implicitly finds a pair of neighboring leaves • Advantages: • Works well for additive and other nonadditive matrices • Does not have the molecular clock assumption

  18. Long branches must be handled carefully!   0.1 0.1 0.1 0.4 0.4   and  are closer to each other than to  or .  Obvious approach produces incorrect clusters!

  19. Compensating for long edges Introduce “correction terms” Average dist. to other taxa “Corrected” distances: Distances are reduced for pairs that are far away from all other species: They may be close to each other.

  20. Neighbor-joining Repeat the following until only two leaves remain: • Choose i, j such that Dij ui uj is minimum • Define a new leaf k whose distances to i and j are • Compute the distance from k to every other leaf r • Delete i and j Connect the 2 remaining leaves by a branch of lengthDij

  21. NJ tree for Sarich’s data

  22. Computing distance matrices • Based on sequence alignment • Various possibilities: • Distance = average number of differences • Try different PAM matrices; distance = index of matrix that gives highest score • Feng and Doolitle: Based on alignment scores – roughly ratio to max possible score (see text) • Read, e.g., PHYLIP documentation:http://evolution.genetics.washington.edu/phylip/general.html

  23. Distance correction • The amount of evolutionary change is not linearly related to time • Over a long period of time, a series of substitutions may bring us back to where we started • Percentage difference may underestimate evolutionary time

  24. Jukes-Cantor Model

  25. Correcting for multiple substitutions in the JC model

  26. Many other models!

More Related