180 likes | 315 Views
Phylogenies. Preliminaries Distance-based methods Parsimony Methods. Phylogenetic Trees. Hypothesis about the relationship between organisms Can be rooted or unrooted. A. B. A. B. C. D. E. E. Time. C. D. Root. Tree proliferation. Molecular phylogenetics.
E N D
Phylogenies Preliminaries Distance-based methods Parsimony Methods
Phylogenetic Trees • Hypothesis about the relationship between organisms • Can be rooted or unrooted A B A B C D E E Time C D Root CS/BIO 271 - Introduction to Bioinformatics
Tree proliferation CS/BIO 271 - Introduction to Bioinformatics
Molecular phylogenetics • Specific genomic sequence variations (alleles) are much more reliable than phenotypic characteristics • More than one gene should be considered CS/BIO 271 - Introduction to Bioinformatics
An ongoing didactic • Pheneticists tend to prefer distance based metrics, as they emphasize relationships among data sets, rather than the paths they have taken to arrive at their current states. • Cladists are generally more interested in evolutionary pathways, and tend to prefer more evolutionarily based approaches such as maximum parsimony. CS/BIO 271 - Introduction to Bioinformatics
Distance matrix methods CS/BIO 271 - Introduction to Bioinformatics
UPGMA • Similar to average-link clustering • Merge the closest two groups • Replace the distances for the new, merged group with the average of the distance for the previous two groups • Repeat until all species are joined CS/BIO 271 - Introduction to Bioinformatics
UPGMA Step 1 Merge D & E D E CS/BIO 271 - Introduction to Bioinformatics
UPGMA Step 2 Merge A & C A C D E CS/BIO 271 - Introduction to Bioinformatics
UPGMA Steps 3 & 4 Merge B & AC A C B D E Merge ABC & DE A C B D E (((A,C)B)(D,E)) CS/BIO 271 - Introduction to Bioinformatics
Parsimony approaches • Belong to the broader class of character based methods of phylogenetics • Emphasize simpler, and thus more likely evolutionary pathways I: GCGGACG II: GTGGACG A (C or T) (C or T) C T C T I II I II CS/BIO 271 - Introduction to Bioinformatics
Informative and uninformative sites For positions 5 & 6, it is possible to select more parsimonious trees – those that invoke less substitutions. CS/BIO 271 - Introduction to Bioinformatics
Parsimony methods • Enumerate all possible trees • Note the number of substitutions events invoked by each possible tree • Can be weighted by transition/transversion probabilities, etc. • Select the most parsimonious CS/BIO 271 - Introduction to Bioinformatics
Branch and Bound methods • Key problem – number of possible trees grows enormous as the number of species gets large • Branch and bound – a technique that allows large numbers of candidate trees to be rapidly disregarded • Requires a “good guess” at the cost of the best tree CS/BIO 271 - Introduction to Bioinformatics
A 93 17 C 46 D 82 20 59 B 31 57 12 82 35 G E 68 15 F Branch and Bound for TSP • Find a minimum cost round-trip path that visits each intermediate city exactly once • NP-complete • Greedy approach:A,G,E,F,B,D,C,A= 251 CS/BIO 271 - Introduction to Bioinformatics
Search all possible paths Best estimate: 251 CS/BIO 271 - Introduction to Bioinformatics
Parsimony – Branch and Bound • Use the UPGMA tree for an initial best estimate of the minimum cost (most parsimonious) tree • Use branch and bound to explore all feasible trees • Replace the best estimate as better trees are found • Choose the most parsimonious CS/BIO 271 - Introduction to Bioinformatics
Parsimony example Position 5: Etc. CS/BIO 271 - Introduction to Bioinformatics