1 / 14

Phylogenetic Trees - Parsimony Tutorial #12

Phylogenetic Trees - Parsimony Tutorial #12. Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in: http://www.cs.technion.ac.il/~moran/lab06.htm - Come to me for more details -. We’d like to study the evolutionary history of species

sharne
Download Presentation

Phylogenetic Trees - Parsimony Tutorial #12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetic Trees - ParsimonyTutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in: http://www.cs.technion.ac.il/~moran/lab06.htm - Come to me for more details - .

  2. We’d like to study the evolutionary history of species • Distance-based approach: • Calculate (ML) pairwise (evolutionary) distances between species • Find the edge-weighted tree best describing this metric • Major drawback: • Lose of information when reducing data to pairwise distances Phylogenetic Reconstruction • Character-based approach: • Consider the character vector of each specie: • morphological characters • bio-molecular characters • Optimization criteria: • parsimony • likelihood / posterior-probability .

  3. AAA 0 1 0 AAA 0 1 AAA AGA 0 0 1 1 1 0 2 AAA AAA GGA AGA AAA GGA AAG AAA AGA AAG Most Parsimonious Tree • Parsimony-score: • Number of character-changes (mutations) along the evolutionary tree • (tree containing labels on internal vertices) • Example: Score = 3 Score = 4 Most parsimonious tree:  Tree with minimal parsimony score Minimal Evolution Principle

  4. Aardvark Bison Chimp Dog Elephant Small vs. Large Parsimony • We break the problem into two: • Small parsimony: Given the topology find the best assignment to internal nodes • Large parsimony: Find the topology which gives best score • Large parsimony is NP-hard • We’ll show solution to small parsimony (Fitch and Sankoff’s algorithms) Input to small parsimony: tree with character-state assignments to leaves Example: A:CAGGTA B:CAGACA C:CGGGTA D:TGCACT E:TGCGTA

  5. Aardvark Bison Chimp Dog Elephant Fitch’s Algorithm • Execute independently for each character: • Bottom-up phase: Determine set of possible statesfor each internal node • Top-down phase: Pick states for each internal node Dynamic Programming framework 1 2 CAGGTA CGGGTA TGCGTA CAGACA TGCACT

  6. Fitch’s AlgorithmBottom-up phase • Determine set of possible statesfor each internal node • Initialization: Ri = {si} • Do a post-order (from leaves to root) traversal of tree • Determine Riof internal node i with children j, k: T T Parsimony-score = # union operations AGT CT GT score = 3 C T G T A T

  7. #characters #states #taxa/nodes Fitch’s AlgorithmTop-down phase • Pick statesfor each internal node • Pick arbitrary state in Rrootfor the root • Do pre-order (from root to leaves) traversal of tree • Determine sjof internal node j with parent i: Complexity: O(mnk) T T AGT CT GT score = 3 C T G T A T

  8. Weighted ParsimonySankoff’s algorithm • Each mutation a↔b costs differently - S(a,b). • Bottom-up phase: Determine Ri(s) – cost of optimal state-assignment for subtree of i, when it is assigned state s. • Top-down phase: Pick optimal states for each internal node • Fitch’s algorithm as special case: • Ri – set of states which yield minimal-cost subtree of i Same as algorithm for optimal lifted tree alignment (Tutorial #4)

  9. Sankoff’s AlgorithmBottom-up phase • Determine Ri(s)for each internal node • Initialization: • Do a post-order (from leaves to root) traversal of tree • Determine Riof internal node i with children j, k: Natural generalization For non-binary trees Remember pointers ss’ C T G T A T

  10. #characters #states #taxa/nodes Sankoff’s AlgorithmTop-down phase • Pick statesfor each internal node • Select minimal cost character for root (s minimizing Rroot(s)) • Do pre-order (from root to leaves) traversal of tree: • - For internal node j, with parent i, select state that produced • minimal cost at i (use pointers kept in 1st stage) Complexity: O(mnk2) T C T G T A

  11. Fitch’s Algorithmas special case of Sankoff’s algorithm • Unweighted parsimony: • Sankoff’s algorithm: • Ri(s) - cost of optimal subtree of i, when it is assigned state s • Fitch’s algorithm: • Score(i) - cost of optimal state-assignment for subtree of i • Ri - set of optimal state-assignment for subtree of i • We need to show that: • Optimal tree assigns node i with state from Ri. • Fitch’s bottom-up recursive formula for Ri. is correct: Check for yourselves

  12. root Parsimony-score is integer i j Fitch’s Algorithmas special case of Sankoff’s algorithm • Unweighted parsimony: • Score(i) - cost of optimal state-assignment for subtree of i • Ri - set of optimal state-assignment for subtree of i • We need to show that: • Optimal tree assigns node i with state from Ri. • Trivially true for the root • Assume (to the contrary) that in an optimal assignment, some node – j is assigned sj∉Rj Why is this not the case for the weighted version? sj∉Rj  Rj(sj) ≥Score(j)+1  By switching from sj to some s∊Rj we do not raise the parsimony-score

  13. Exploring the Space of Trees • We saw how to find optimal state-assignment for a given tree topology • We need to explore space of topologies • Given n sequences there are (2n-3)!! possible rooted trees • and (2n-5)!! possible unrooted trees taxa (n) # rooted trees # unrooted trees 3 3 1 4 15 3 5 105 15 6 945 105 8 135,135 10,395 10 34,459,425 2,027,025

  14. A - T G G G G - - T T A - T A C C C - G - Exploring the Space of Trees • Possible solutions: • Heuristic solutions for “traveling” through “topology-space” • Find (basic) topology using distance-based methods (NJ) • Notice another problem: • We obtain state-assignments to taxa using multiple alignment • We obtain optimal MA using topology of phylogenetic tree • (e.g. CLUSTAL) • Solution: • Again, use some initial topology (via NJ) C1,C2 , … , Cm

More Related