260 likes | 491 Views
The Genome Access Course Phylogenetic Analysis. Phylogenetics. Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966). What is the ancestral sequence?. pfeffer pepper (pf/p)e(ff/pp)er. Evolutionary Trees.
E N D
Phylogenetics • Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)
What is the ancestral sequence? • pfeffer • pepper • (pf/p)e(ff/pp)er
Evolutionary Trees • A tree is a connected, acyclic 2D graph • Leaf: Taxon • Node: Vertex • Branch: Edge • Tree length = sum of all branch lengths • Phylogenetic trees are binary trees
Evolutionary Trees • Rooted • common ancestor • unique path to any leaf • directed • Unrooted • root could be placed anywhere • fewer possible than rooted
Rooted Tree generated by DRAWGRAM (PHYLIP)
Unrooted Tree generated by DRAWTREE (PHYLIP)
Genes vs. Species • Sequences show gene relationships, but phylogenetic histories may be different for gene and species • Genes evolve at different speeds • Horizontal gene transfer
Methods for Phylogenetic Analysis • Character-State • Maximum Parsimony • Maximum Likelihood • Genetic Distance • Fitch & Margoliash • Neighbor-Joining • Unweighted Pair Group
Phylogenetic Software • PHYLIP • PAUP (Available in GCG) • TREE-PUZZLE • PhyloBLAST • Felsenstein maintains an extensive list of programs on the PHYLIP site
PHYLIP Programs • dnapars/protpars • dnadist/protdist • dnaml (use fastDNAml instead) • neighbor • fitch/kitsch • drawtree/drawgram
Maximum Parsimony • Most common method • Allows use of all evolutionary information • Build and score all possible trees • Each node is a transformation in a character state • Minimize treelength • Best tree requires the fewest changes to derive all sequences
3 Nodes 3 Nodes Which is the more parsimonious tree? 9 Node Crossings 8 Node Crossings
Maximum Likelihood • Reconstruction using an explicit evolutionary model • Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data. • Demanding computationally • Slowest method • Use to test (or improve) an existing tree
Clustering Algorithms • Use distances to calculate phylogenetic trees • Trees are based on the relative numbers of similarities and differences between sequences • A distance matrix is constructed by computing pairwise distances for all sequences • Clustering links successively more distant taxa
DNA Distances • Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences • Can only work for pairs of sequences that are similar enough to be aligned • All base changes are considered equal • Insertion/deletions are generally given a larger weight than replacements (gap penalties). • Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.
Amino Acid Distances • More difficult to compute • Substitutions have differing effects on structure • Some substitutions require more than one DNA mutation • Use replacement frequencies (PAM, BLOSUM)
Fitch & Margoliash • 3 sequences are combined at a time to define branches and calculate their length • Additive branch lengths • Accurate for short branches
Neighbor Joining • Most common method of tree construction • Distance matrix adjusted for each taxon depending on its rate of evolution • Good for simulation studies • Most efficient computationally
UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages • Simplest method • Calculates branch lengths between most closely related sequences • Averages distance to next sequence or cluster • Predicts a position for the root
Phylogenetic Complications • Errors • Loss of function • Convergent evolution • Lateral gene transfer
Validation • Use several different algorithms and data sets • NJ methods generate one tree, possibly supporting a tree built by parsimony or maximum likelihood • Bootstrapping • Perturb data and note effect on tree • Repeat many times • Unchanged ~90%, tree’s correctness is supported
Are there bugs in our genome? N-acetylneuraminate lyase