270 likes | 279 Views
Learn about different methods and algorithms used in building phylogenetic trees using parsimony, including distance-based methods and maximum likelihood. Understand the concept of small and large parsimony, and explore algorithms like Fitch's and Sankoff's for small parsimony reconstruction. Discover the importance of homology, orthology, and paralogy in phylogenetic analysis.
E N D
Building Phylogenies Parsimony 1
Methods • Distance-based • Parsimony • Maximum likelihood
Note • Some of the following figures come from: • [S05] Swofford http://www.csit.fsu.edu/~swofford/bioinformatics_spring05 • [F05] Felsenstein http://evolution.gs.washington.edu/gs541/2005/
Parsimony methods • Goal:Find the tree that allows evolution of the sequences with the fewest changes. • This is called a most parsimonious (MP) tree • Parsimony is implemented in PAUP* http://paup.csit.fsu.edu/ • Compatibility methods are closely related to parsimony: • Goal: Find tree that perfectly fits the most characters.
G A G A G Evolutionary Steps Steps can have weights
a 0 1 1 1 A B C D b 0 1 1 1 c 0 0 1 1 d 0 1 1 0 e 0 0 0 1 f 1 0 0 0 D A B C Parsimony a, b f c d d e Typically, each site is treated separately
Some numbers Number of unrooted trees on n 2 species: Un = (2n5)(2n7)(2n9) . . . (3)(1), Number of rooted trees on n 3 species: Rn = (2n5) Un
Small versus Large Parsimony • Parsimony score of a tree: The smallest (weighted) number of steps required by the tree • (Large) Parsimony: Find the tree with the lowest parsimony score • Small Parsimony: Given a tree, find its parsimony score • Small parsimony is by far the easier problem. • Used to solve large parsimony
A DNA data set [F05]
An example tree [F05]
Evolutionary steps on tree Only one choice of reconstruction at each site is shown 9 steps in all
Algorithms for Small Parsimony • Fitch’s algorithm: • Based on set operations • Evolutionary steps have same weight • Sankoff’s algorithm: • Based on dynamic programming • Allows steps to have different weights • Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site.
Fitch’s Algorithm • Each node v in tree has a set X(v) • If v is a leaf (tip), X(v) is the nucleotide observed at v • if there is ambiguity, X(v) contains all possible nucleotides at v • If v is a node with descendants u and w, • Let Y X(u) X(w) • If Y make X(v) Y, • If Y make X(v) X(u)X(w) and count one step.
Sankoff’s Algorithm • Let cij be the cost of going from state i to state j. • E.g., transitions (AG or CT) are more probable than transversions, so give lower weight to transitions • Let Sv(k) be the smallest (weighted) number of steps needed to evolve the subtree at or above node v, given that node v is in state k.
Sankoff’s Algorithm • If v is a leaf (tip) • If v is a node with descendants u and w • The minimum number of (weighted) steps is
Searching for an MP tree • Exhaustive search (exact) • Branch-and-bound search (exact) • Heuristic search methods • Stepwise addition • Branch swapping • Star decomposition
Homology, orthology, and paralogy • Homology: Similarity attributed to descent from a common ancestor. • Orthologous sequences: Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. • Paralogous sequences: Homologous sequences within a single species that arose by gene duplication.
Orthology and Paralogy http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html