1 / 72

Phylogenetic Trees Lecture 2

Phylogenetic Trees Lecture 2. Based on: Durbin et al 7.4; Gusfield 17. Character-based methods for constructing phylogenies. In this approach, trees are constructed by comparing the characters of the corresponding species.

ceri
Download Presentation

Phylogenetic Trees Lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetic TreesLecture 2 Based on: Durbin et al 7.4; Gusfield 17 .

  2. Character-based methodsfor constructing phylogenies In this approach, trees are constructed by comparing the characters of the corresponding species. Characters may be morphological (teeth structures) or molecular (homologous DNA sequences). One common approach is Maximum Parsimony. Assumptions: • Independence of characters (no interactions) • Best tree is one where minimal changes take place

  3. One Answer (the parsimony principle): Pick a tree that has a minimum total number of substitutions of symbols between species and their originator in the phylogenetic tree. AAA AAA AAA 2 1 1 GGA AGA AAG AAA Total # of substitutions = 4 1. Maximum Parsimony Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. Question: Which evolutionary tree best explains these sequences ?

  4. AAA AAA 1 AAA AAA AGA AAA 1 2 1 1 1 AAA AGA AGA GGA AAG GGA AAG AAA Total #substitutions = 3 Total #substitutions = 4 Example Continued There are many trees possible. For example: The left tree is preferred over the right tree. The total number of changes is called the parsimony score.

  5. Simple Example • Suppose we have five species, such that three have ‘C’ and two ‘T’ at a specified position • Minimal tree has one evolutionary change: C T C T C C C T T  C

  6. Aardvark Bison Chimp Dog Elephant Extension to Many Letters • What is the parsimony score of A: CAGGTA B: CAGACA C: CGGGTA D: TGCACT E: TGCGTA We do it character after character; each score is computed independently of the others.

  7. Fitch’s Algorithm of Evaluating Trees • Assume that a tree is given. • Traverse tree from leaves to root determining set of possible states (e.g. nucleotides) for each internal node • Traverse tree from root to leaves picking ancestral states for internal nodes

  8. T T AGT CT GT C G T T A T Fitch’s Algorithm – Step 1 • # of changes = # union operations

  9. Fitch’s Algorithm – Step 1 • Do a post-order (from leaves to root) traversal of tree • Determine possible states Riof internal node i with children j and k

  10. T T T T T T T T T T T T AGT AGT AGT AGT AGT AGT CT CT CT CT CT CT GT GT GT GT GT GT C C C C C C G G G G G G T T T T T T T T T T T T A A A A A A T T T T T T Fitch’s Algorithm – Step 2

  11. Fitch’s Algorithm – Step 2 • Do a pre-order (from root to leaves) traversal of tree • Select state rj of internal node j with parent i

  12. Weighted Version of Fitch’s Algorithm • Instead of assuming all state changes are equally likely, use different costs c(a, b) for different changes • 1st step of algorithm is to propagate costs up through tree

  13. Weighted Version of Fitch’s Algorithm • Want to determine minimal cost S(i, a) • of assigning character a to node i • For leave nodes i :

  14. Weighted Version of Fitch’s Algorithm • Want to determine minimal cost S(i, a) • of assigning character a to node i • For internal nodes: a i j k b

  15. Weighted Version of Fitch’s Algorithm – Step 2 • Do a pre-order (from root to leaves) traversal of tree • Select minimal cost character for root • For each internal node j, select character that produced minimal cost at parent i

  16. Weighted Parsimony Scores Weighted Parsimony score: • Each change is weighted by a score c(a, b). • The weighted parsimony score reduces to the parsimony score when c(a,a)=0 and c(a,b)=1 for all b  a.

  17. i k j Evaluating Weighted Parsimony Scores Each position is independent and computed by itself. Use Dynamic Programming on a given tree. • If i is a node with children j and k , then S(i, a) = minx(S(j, x)+c(a, x)) + miny(S(k, y)+c(a, y)) S(i, a)the minimum score of subtree rooted at k when k has character a. S(i,a) S(j,x) S(k,y)

  18. Evaluating Parsimony Scores Dynamic programming on a given tree Initialization: • For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) =  Iteration: • if i is node with children j and k, then S(i,a) = minx(S(j,x)+c(a,x)) + miny(S(k,y)+c(a,y)) Termination: • cost of tree is minxS(r,x) where r is the root Comment: To reconstruct an optimal assignment, we need to keep in each node i and for each character a the two characters x, y that bring about the minimum when i has character a.

  19. Cost of Evaluating Parsimony for binary trees • If there are n nodes, m characters, and k possible values for each character, then complexity is O(nmk2). Of course, we still need to search over ALL possible trees and find the best one. One usually resorts to heuristic search techniques.

  20. Exploring the Space of Trees We’ve considered how to find the minimum number of changes for a given tree topology Need some search procedure for exploring the space of tree topologies Given n sequences there are possible rooted trees

  21. 1 3 2 Counting Trees n = 3 One Unrooted Tree: n = 4 3 Unrooted Trees A rooted tree with n leaves has (2n-1) nodes and (2n-2) edges, discounting the edge to the root; hence an unrooted tree has (2n-3) edges. For each additional leaf we add two edges. Therefore we have 1 • 3 • 5 • … • (2n-5) unrooted trees with n leaves. Each of such trees has (2n-3) edges, which can be chosen as a root of the rooted tree. Hence we have 1 • 3 • 5 • … • (2n-5) • (2n-3) rooted trees with n leaves

  22. taxa (n) # of rooted trees 4 15 5 105 6 945 8 135,135 10 30,405,375 Exploring the Space of Trees

  23. Maximum Parsimony 1 2 3 4 5 6 7 8 9 10 Species 1 – A G G G T A A C T G Species 2 - A C G A T T A T T A Species 3 - A T A A T T G T C T Species 4 - A A T G T T G T C G How many possible unrooted trees?

  24. Maximum Parsimony How many possible unrooted trees? 1 2 3 4 5 6 7 8 9 10 Species 1 - A G G G T A A C T G Species 2 - A C G A T T A T T A Species 3 - A T A A T T G T C T Species 4 - A A T G T T G T C G

  25. Maximum Parsimony How many substitutions? MP

  26. 0 0 0 Maximum Parsimony 1 2 3 4 5 6 7 8 9 10 1 -A G G G T A A C T G 2 -A C G A T T A T T A 3 -A T A A T T G T C T 4 -A A T G T T G T C G

  27. 0 3 0 3 0 3 Maximum Parsimony 1 2 3 4 5 6 7 8 9 10 1 -A G G G T A A C T G 2 -A C G A T T A T T A 3 -A T A A T T G T C T 4 -A A T G T T G T C G

  28. G T 3 C A C G C 3 T A C G T 3 A C C Maximum Parsimony 1 - G 2 - C 3 - T 4 - A

  29. 0 3 2 0 3 2 0 3 2 Maximum Parsimony 1 2 3 4 5 6 7 8 9 10 1 -A G G G T A A C T G 2 -A C G A T T A T T A 3 -A T A A T T G T C T 4 -A A T G T T G T C G

  30. 0 3 2 2 0 3 2 2 0 3 2 1 Maximum Parsimony 1 2 3 4 5 6 7 8 9 10 1 -A G G G T A A C T G 2 -A C G A T T A T T A 3 -A T A A T T G T C T 4 -A A T G T T G T C G

  31. G A Maximum Parsimony G A 4 1 - G 2 - A 3 - A 4 - G 2 A G A G A 2 A G A 1 G A A

  32. 0 3 2 2 0 1 1 1 1 3 14 0 3 2 2 0 1 2 1 2 3 16 0 3 2 1 0 1 2 1 2 3 15 Maximum Parsimony

  33. Maximum Parsimony 1 2 3 4 5 6 7 8 9 10 1 -A G G G T A A C T G 2 -A C G A T T A T T A 3 -A T A A T T G T C T 4 -A A T G T T G T C G 0 3 2 2 0 1 1 1 1 314

  34. Finding most parsimonious trees - exact solutions • Exact solutions can only be used for small numbers of taxa. • Exhaustive searchexamines all possible trees. • Typically used for problems with less than 10 taxa.

  35. B C E D E A E E Finding most parsimonious trees - exhaustive search (1) B C Starting tree, any 3 taxa A Add fourth taxon (D) in each of three possible positions: three trees E D C D B B C (2b) (2a) (2c) A A Add fifth taxon (E) in each of the five possible positions on each of the three trees -> 15 trees, and so on

  36. Finding most parsimonious trees - exact solutions • Branch and bound saves time by discarding families of trees during tree construction that can not be smaller than the smallest tree found so far. (Here “smaller” means “smaller score”; i.e., more parsimonious.) • Can be enhanced by specifying an initial upper bound for tree length(total # of changes on the tree); e.g., from distance method. • Typically used only for problems with less than 20 taxa.

  37. Finding most parsimonious trees: branch and bound C2.1 B C C C3.1 D C B C2.2 B C3.2 D C2.3 C3.3 A C2.4 C3.4 B2 B3 A A C2.5 C3.5 D B E B E B D C C D C B1 C1.1 C1.5 A A A B E D D B D B E C C1.3 C C C1.2 E C1.4 A A A

  38. Finding most parsimonious trees - heuristics • The number of possible trees increases exponentially with the number of taxa making exhaustive searches impractical for many data sets (an NP complete problem) • Heuristic methods are used to search tree space for most parsimonious trees • The trees found are not guaranteed to be the most parsimonious - they are best guesses

  39. Finding most parsimonious trees - heuristics • Stepwise addition • Asis - the order in the distance matrix • Closest -starts with shortest 3-taxon tree and adds taxa in order that produces the least increase in tree length • Simple - the first taxon in the matrix is a taken as a reference - taxa are added to it in the order of their decreasing similarity to the reference • Random - taxa are added in a random sequence, many different sequences can be used • Recommend random with as many (e.g. 10-100) addition sequences as practical

  40. Finding most parsimonious trees - heuristics Branch Swapping: Nearest neighbor interchange (NNI) • Subtree pruning and regrafting (SPR) • Tree bisection and reconnection (TBR)

  41. Finding most parsimonious trees - heuristics 1 Nearest neighbor interchange (NNI) C D E A F B G D C C D E A E A F B F B G G

  42. A B Finding most parsimonious trees - heuristics 2 Subtree pruning and regrafting (SPR) C D E A F B G E C D F E G C F B D A G

  43. A B Finding most parsimonious trees - heuristics 3 Tree bisection and reconnection (TBR) C D E A F B G B G E F A D C F D C E G

  44. Finding most parsimonious trees - heuristics - summary • Branch Swapping • Nearest neighbor interchange (NNI) • Subtree pruning and regrafting (SPR) • Tree bisection and reconnection (TBR) • The nature of heuristic searches means we cannot know which method will find the most parsimonious trees or all such trees. • However, TBR is the most extensive swapping routine and its use with multiple random addition sequences should work well.

  45. Tree space may be populated by local minima and islands of most parsimonious trees RANDOMADDITIONSEQUENCE REPLICATES Tree FAILURE SUCCESS FAILURE Length Branch Swapping Branch Swapping BranchSwapping Local Minimum Local GLOBAL Minima MINIMUM

  46. Multiple most parsimonious trees • Many parsimony analyses yield multiple equally optimal trees • Multiple trees are due to either: • Alternative equally parsimonious optimizations of homoplastic characters • Missing data • Or both • We can further select among these trees with additional criteria, but • Most commonly relationships common to all the optimal trees are summarized with consensus trees

  47. Consensus methods • A consensus tree is a summary of the agreement among a set of fundamental trees • There are many different consensus methods that differ in: • 1. the kind of agreement • 2. the level of agreement • Consensus methods can be used with any types of tree - not just parsimony

  48. Strict consensus methods • Strict consensus methods require agreement across all the fundamental trees • They show only those relationships that are unambiguously supported by the parsimonious interpretation of the data • The commonest method (strict component consensus) focuses on clades • This method produces a consensus tree that includes all and only those clades found in all the fundamental trees • Other relationships (those in which the fundamental trees disagree) are shown as unresolved polytomies

  49. Strict consensus methods TWOFUNDAMENTALTREES A B C D E F G B E F G A C D B D F G A C E STRICT COMPONENT CONSENSUS TREE

  50. Majority-rule consensus methods • Majority-rule consensus methods require agreement across a majority of the fundamental trees • May include relationships that are not supported by the most parsimonious interpretation of the data • The commonest method focuses on clades • This method produces a consensus tree that includes all and only those clades found in a majority (>50%) of the fundamental trees • Other relationships are shown as unresolved polytomies • Of particular use in bootstrapping

More Related