150 likes | 694 Views
Phylogenetic Trees - Parsimony Tutorial #12. Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in: http://www.cs.technion.ac.il/~moran/lab06.htm - Come to me for more details -. We’d like to study the evolutionary history of species
E N D
Phylogenetic Trees - ParsimonyTutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in: http://www.cs.technion.ac.il/~moran/lab06.htm - Come to me for more details - .
We’d like to study the evolutionary history of species • Distance-based approach: • Calculate (ML) pairwise (evolutionary) distances between species • Find the edge-weighted tree best describing this metric • Major drawback: • Lose of information when reducing data to pairwise distances Phylogenetic Reconstruction • Character-based approach: • Consider the character vector of each specie: • morphological characters • bio-molecular characters • Optimization criteria: • parsimony • likelihood / posterior-probability .
AAA 0 1 0 AAA 0 1 AAA AGA 0 0 1 1 1 0 2 AAA AAA GGA AGA AAA GGA AAG AAA AGA AAG Most Parsimonious Tree • Parsimony-score: • Number of character-changes (mutations) along the evolutionary tree • (tree containing labels on internal vertices) • Example: Score = 3 Score = 4 Most parsimonious tree: Tree with minimal parsimony score Minimal Evolution Principle
Aardvark Bison Chimp Dog Elephant Small vs. Large Parsimony • We break the problem into two: • Small parsimony: Given the topology find the best assignment to internal nodes • Large parsimony: Find the topology which gives best score • Large parsimony is NP-hard • We’ll show solution to small parsimony (Fitch and Sankoff’s algorithms) Input to small parsimony: tree with character-state assignments to leaves Example: A:CAGGTA B:CAGACA C:CGGGTA D:TGCACT E:TGCGTA
Aardvark Bison Chimp Dog Elephant Fitch’s Algorithm • Execute independently for each character: • Bottom-up phase: Determine set of possible statesfor each internal node • Top-down phase: Pick states for each internal node Dynamic Programming framework 1 2 CAGGTA CGGGTA TGCGTA CAGACA TGCACT
Fitch’s AlgorithmBottom-up phase • Determine set of possible statesfor each internal node • Initialization: Ri = {si} • Do a post-order (from leaves to root) traversal of tree • Determine Riof internal node i with children j, k: T T Parsimony-score = # union operations AGT CT GT score = 3 C T G T A T
#characters #states #taxa/nodes Fitch’s AlgorithmTop-down phase • Pick statesfor each internal node • Pick arbitrary state in Rrootfor the root • Do pre-order (from root to leaves) traversal of tree • Determine sjof internal node j with parent i: Complexity: O(mnk) T T AGT CT GT score = 3 C T G T A T
Weighted ParsimonySankoff’s algorithm • Each mutation a↔b costs differently - S(a,b). • Bottom-up phase: Determine Ri(s) – cost of optimal state-assignment for subtree of i, when it is assigned state s. • Top-down phase: Pick optimal states for each internal node • Fitch’s algorithm as special case: • Ri – set of states which yield minimal-cost subtree of i Same as algorithm for optimal lifted tree alignment (Tutorial #4)
Sankoff’s AlgorithmBottom-up phase • Determine Ri(s)for each internal node • Initialization: • Do a post-order (from leaves to root) traversal of tree • Determine Riof internal node i with children j, k: Natural generalization For non-binary trees Remember pointers ss’ C T G T A T
#characters #states #taxa/nodes Sankoff’s AlgorithmTop-down phase • Pick statesfor each internal node • Select minimal cost character for root (s minimizing Rroot(s)) • Do pre-order (from root to leaves) traversal of tree: • - For internal node j, with parent i, select state that produced • minimal cost at i (use pointers kept in 1st stage) Complexity: O(mnk2) T C T G T A
Fitch’s Algorithmas special case of Sankoff’s algorithm • Unweighted parsimony: • Sankoff’s algorithm: • Ri(s) - cost of optimal subtree of i, when it is assigned state s • Fitch’s algorithm: • Score(i) - cost of optimal state-assignment for subtree of i • Ri - set of optimal state-assignment for subtree of i • We need to show that: • Optimal tree assigns node i with state from Ri. • Fitch’s bottom-up recursive formula for Ri. is correct: Check for yourselves
root Parsimony-score is integer i j Fitch’s Algorithmas special case of Sankoff’s algorithm • Unweighted parsimony: • Score(i) - cost of optimal state-assignment for subtree of i • Ri - set of optimal state-assignment for subtree of i • We need to show that: • Optimal tree assigns node i with state from Ri. • Trivially true for the root • Assume (to the contrary) that in an optimal assignment, some node – j is assigned sj∉Rj Why is this not the case for the weighted version? sj∉Rj Rj(sj) ≥Score(j)+1 By switching from sj to some s∊Rj we do not raise the parsimony-score
Exploring the Space of Trees • We saw how to find optimal state-assignment for a given tree topology • We need to explore space of topologies • Given n sequences there are (2n-3)!! possible rooted trees • and (2n-5)!! possible unrooted trees taxa (n) # rooted trees # unrooted trees 3 3 1 4 15 3 5 105 15 6 945 105 8 135,135 10,395 10 34,459,425 2,027,025
A - T G G G G - - T T A - T A C C C - G - Exploring the Space of Trees • Possible solutions: • Heuristic solutions for “traveling” through “topology-space” • Find (basic) topology using distance-based methods (NJ) • Notice another problem: • We obtain state-assignments to taxa using multiple alignment • We obtain optimal MA using topology of phylogenetic tree • (e.g. CLUSTAL) • Solution: • Again, use some initial topology (via NJ) C1,C2 , … , Cm