410 likes | 688 Views
CS 177 Phylogenetics II. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic software packages. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages. Phylogenetics II. Disclaimers.
E N D
CS 177 Phylogenetics II Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic software packages Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Phylogenetics II Disclaimers Before describing any theoretical or practical aspects of phylogenetics, it is necessary to give some disclaimers. This area of computational biology is an intellectual minefield! Neither the theory nor the practical applications of any algorithms are universally accepted throughout the scientific community. The application of different software packages to a data set is very likely to give different answers; minor changes to a data set are also likely to profoundly change the result. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
helix sheet Phylogenetics II Are there Correct trees?? Despite all of all problems, it is actually quite simple to use computer programs calculate phylogenetic trees for data sets Provided the data are clean, outgroups are correctly specified, appropriate algorithms are chosen, no assumptions are violated, etc., can the true, correct tree be found and proven to be scientifically valid? Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Phenetics versus cladistics • Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes;phenograms are based on overall similarity • Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data;cladograms are based on character evolution (e.g. shared derived characters) Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Tree building methods Data type: genetic distance / character-state • Computational method: optimality criterion/clustering algorithmen Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Tree building (distance based) • UPGMA • - The simplest of the distance methods is the UPGMA (Unweighted Pair Group Method using Arithmetic averages) • Many multiple alignment programs such as PILEUP use a variant of UPGMA to create a dendrogram of DNA sequences which is then used to guide the multiple alignment algorithm Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
UPGMA Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
UPGMA Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
UPGMA Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
UPGMA Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
UPGMA Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Root UPGMA Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
I II III Maximum Parsimony (MP) • Parsimony involves evaluating all possible trees for each vertical column of sequence character (nucleotide position) • only informative sites are considered • each tree is given a score based on the number of evolutionary changes that are needed to explain the observed data • - finally, those trees that produce the smallest number of changes (shortest trees) overall for all sequence positions are identified Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
I II III Maximum Likelihood (ML) • Maximum Likelihood uses probability calculations based on a specific model of sequence evolution to find a tree that best accounts for the variation in a set of sequences • all possible trees for each nucleotide position are considered • the less mutations needed to fit a tree to the data, the more likely the tree • ML resembles MP in that the tree with the least number of changes will be most likely • however, ML evaluates trees using explicit evolutionary models • thus, the method can be used to explore relationships among more diverse taxa Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Taxa (n) unrooted (2n-5)!/(2n-3(n-3)!) 2 1 3 1 4 3 5 15 6 105 7 954 8 10,395 9 135,135 10 2,027,025 . . . . . . 30 3.58 x 1036 Computational methods for finding optimal trees Possible evolutionary trees Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Computational methods for finding optimal trees • Exact algorithms • “Guarantee” to find the optimal or “best” tree for the method of choice • Two types used in tree building: • Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method • Branch-and-bound search: Eliminates part of the tree that only contain suboptimal solutions • Heuristic algorithms • Approximate or “quick-and-dirty” methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so • Often operate by “hill-climbing” methods Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Search for global maximum Search for global minimum GLOBAL MAXIMUM GLOBAL MAXIMUM local maximum Rerunning heuristic searches using different input orders of taxa can help find global minima or maxima local minimum GLOBAL MINIMUM GLOBAL MINIMUM Heuristic algorithms Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages Heuristic search algorithms are input order dependent and can get stuck in local minima or maxima From NHGRI lecture, C.-B. Stewart
Assessing Phylogenetic Data Most data includes potentially misleading evidence of relationships One should not only construct phylogenetic hypotheses but should also assess what ‘confidence’ can be placed in these hypotheses Questions: How much support is there for a particular clade? Is there signal in the data? Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Assessing Phylogenetic Data How much support is there for a particular clade? Bootstrapping/Jack-knifing: Lots of randomized data sets are produced by sampling the real data with replacement (or in jackknifing, by removing some random proportion of the data); Frequencies of occurrence of groups are a measure of support for those groups Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages Problems: - Bootstrap proportions aren’t easily interpretable - no indication for how good the data are but simply for how well the tree fits the data
Assessing Phylogenetic Data Is there signal in the data? Possible approach: Random Permutations - Random permutation destroys any correlation among characters to that expected by chance alone - It preserves number of taxa, characters and character states in each character (and the theoretical maximum and minimum tree lengths) Original structured data with strongcorrelations among characters Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages Randomly permuted data with any correlationamong characters due to chance
Assessing Phylogenetic Data Original structured data with strongcorrelations among characters Randomly permuted data with any correlationamong characters due to chance Matrix Randomization Tests Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages Compare some measure of data quality/hierarchical structure for the real and many randomly permuted data sets This allows us to define a test statisticfor the null hypothesis that the real data are not better structured than randomly permuted and phylogenetically uninformative data
PTP (permutation tail probability) test Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages Null Hypothesis:The length of the shortest tree is what you would see given random data How it works:Reject the null if the real data has shorter tree(the real data is more internally consistent than random data) Comments:Even a little bit of signal can lead you to reject the null; does not mean phylogenetic signal
Popular phylogenetic software packages Review available at: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages PHYLIP version 3.6, Joe Felsenstein It is available free, from its Web site, in C source code, or as executables for pre-386 DOS, 386/486/Pentium DOS, Windows 3.1, Windows95/98/NT, 68k Macintosh, or PowerMac. The C source code is easily compiled on Unix systems, and VMS compilation support is also available in the package. It includes programs to carry out parsimony, distance matrix methods, maximum likelihood, and other methods on a variety of types of data, including DNA and RNA sequences, protein sequences, restriction sites, 0/1 discrete characters data, gene frequencies, continuous characters and distance matrices. It is the most widely-distributed phylogeny package, with over 7,000 registered users, some of them satisfied. It competes with PAUP* to be the program responsible for the most published trees. It has been distributed since October, 1980. PHYLIP is distributed at the PHYLIP web site at http://evolution.genetics.washington.edu, or by anonymous ftp from evolution.genetics.washington.edu in directory pub/phylip. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages PHYLIP version 3.6, Joe Felsenstein Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages PAUP* (Phylogenetic Analysis Using Parsimony and other Methods) version 4.0beta,David Swofford PAUP* has been released as a provisional version by Sinauer Associates, of Sunderland, Massachusetts. It has Macintosh, PowerMac, Windows, and Unix/OpenVMS versions. PAUP* is the most sophisticated parsimony program, with many options and close compatibility with MacClade. It has become much broader with the inclusion of more methods. It includes parsimony, distance matrix, invariants, and maximum likelihood methods and many indices and statistical tests. It is described in a web page at http://www.sinauer.com/Titles/frswofford.htm, and in more detail at its web site at the LMS at http://www.lms.si.edu/PAUP/about.html. The price is $100 US for the Macintosh and PowerMac executable versions, $85 for the Windows executable version, and $150 for the Unix source code version, plus $20 for shipment. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages PAUP* (PhylogeneticAnalysis Using Parsimonyand other Methods)version 4.0beta,David Swofford Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages MrBayes: Bayesian Inference of Phylogeny MrBayes is a program for Bayesian inference of phylogeny using Markov chain Monte Carlo methods. Avaialble for Mac, PC, and Unix. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages MacClade, Wayne Maddison and David Maddison MacClade is described on its Web page, at http://phylogeny.arizona.edu/macclade/ macclade.html. A demonstration version of MacClade 3 is also available there. MacClade enables you to use the mouse-window interface to specify and rearrange phylogenies by hand, and watch the number of character steps and the distribution of states of a given character on the tree change as you do so. Available for Macintosh only. All distribution is by Sinauer Associates, 23 Plumtree Road, Sunderland, Massachusetts 01375-0407, USA. A disk with program, help file, and example data files, plus book (which has about 100 pages of intro to phylogenetic theory, and 250 pages of program instructions), is $100 U.S. ($40 for the book alone). Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages MacClade, Wayne Maddison and David Maddison Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages RASA, version 2.5, James Lyons-Weiler Software for Macintoshes that will perform "Relative Apparent Synapomorphy Analysis", a test for the presence of phylogenetic signal in any type of discrete character data matrix (morphological or molecular). The RASA program carries out the test and plots the results. RASA is menu-driven. The test compares the observed and null rates of increase in cladistic similarity among pairs of taxa predicted by an increase in the phenetic similarity among taxon pairs. The programs are available as Macintosh executables from their web page at http://bio.uml.edu/LW/RASA.html. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages TCS version 1.06, Mark Clement and David Posada A program for estimating gene genealogies within a population. It does so by using the method introduced in the paper: Templeton, A. R., K. A. Crandall and C. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics132: 619-633. This is a method that connects existing haplotypes in a minimum spanning tree which is essentially a parsimony method. It can also infer networks with loops in them. TCS is written in Java and has a graphic user interface for the display of the resulting networks. It may be run on any system that has the Java runtime environment. The program is described in the paper: Clement M., D. Posada, and K. Crandall. 2000. TCS: a computer program to estimate gene genealogies. Molecular Ecology9: 1657-1660. TCS is available as Java executables, with documentation, at its web site at: http://bioag.byu.edu/zoology/crandall_lab/tcs.htm. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages TCS version 1.06, Mark Clement and David Posada Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages BioEdit, version 4.8.4., Tom Hall This is a sequence editor with many kinds of general molecular biology functions available (alignment, BLAST searches, plasmid drawing, restriction mapping, sequence machine trace viewing, etc.). For our purposes the feature worth mentioning is that it comes with a number of existing phylogeny programs which can be automatically run from within BioEdit. These are: TreeView, fastDNAml, and six DNA and protein programs from PHYLIP. BioEdit is available as Windows95/98/NT executables from its web site at http://www.mbio.ncsu.edu/RNaseP/info/programs/BIOEDIT/bioedit.html. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages TreeView, Rod Page A program for displaying trees on Apple Macs and Windows PCs. It can draw rooted and unrooted trees, display bootstrap values, and supports the native font and graphics file formats of both Macs and PCs. The program reads NEXUS, PHYLIP, and Hennig86 style tree files (including files produced by fastDNAml and CLUSTALW), and can save trees in the same formats so that it can convert trees among these formats. TreeView can read up to 100 trees with up to 500 taxa. The program is free, and can be obtained by World Wide Web from http://taxonomy.zoology.gla.ac.uk/rod/treeview.html. It comes in 68K Mac, PowerMac, and Windows 95/NT executable versions (and in a Windows 3.1 executable for version 1.4). There is also online help including an online manual. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages TreeView, Rod Page Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages DnaSP version 3.53, Julio Rozas and Ricardo Rosas A software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters. It can also carry out several tests of neutrality: Additionally, it can estimate the confidence intervals of some test-statistics by the coalescent. The results of the analyses are displayed on tabular and graphic form. For the purposes of this web site, the relevant features are the calculation of measures of population divergence, which include the Jukes-Cantor method which can be used as a distance in phylogeny reconstruction. It is distributed as a Windows95/98/NT executable from its web site at http://www.bio.ub.es/~julio/DnaSP.html. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages DnaSP version 3.53, Julio Rozas and Ricardo Rosas Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages
Popular phylogenetic software packages Arlequin version 2.0, Laurent Excoffier A program for population genetics analysis. It can perform many kinds of population genetic tasks including estimation of gene frequencies, testing of linkage disequilibrium, and analysis of diversity between populations. For the purposes of this list, the relevant feature is its ability to compute a variety of genetic distance measures including of Jukes and Cantor, the Kimura 2-parameter distance, and the Tamura-Nei distance, each of these with or without correction for gamma-distributed rates of evolution. It can also compute a Minimum Spanning Tree network. Arlequin has its interactive "front end" written in Java, and requires the Java Runtime Environment (which is available from the Arlequin site for those who do not already have it). The core routines are available as binaries for Windows95/98/NT/2000, for MacOS for the PowerPC processor, and for Linux for Intel-compatible x86 processors. The binaries, Java code, Java Runtime Environment, and a PDF documentation file are available at its web site at http://acasun1.unige.ch/arlequin/. Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages All information from: http://evolution.genetics.washington.edu/phylip/software.html
Popular phylogenetic software packages Arlequin version 2.0, Laurent Excoffier Tree building methods:some examples Assessing phylogenetic data Popular phylogenetic packages