1 / 62

Introduction to Phylogenetic Systematics

Introduction to Phylogenetic Systematics. Mark Fishbein Dept. Biological Sciences Mississippi State University 13 October 2003. Which of these critters are most closely related?. alligator. gila monster. purple gallinule. ?. gopher tortoise. kingsnake. Phylogeny.

salena
Download Presentation

Introduction to Phylogenetic Systematics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Phylogenetic Systematics Mark Fishbein Dept. Biological Sciences Mississippi State University 13 October 2003

  2. Which of these critters are most closely related?

  3. alligator gila monster purple gallinule ? gopher tortoise kingsnake

  4. Phylogeny • Branching history of evolutionary lineages • New branches arise via speciation • Speciation occurs when gene flow is severed between populations • Phylogenetic relationships depicted as a tree

  5. © W. S. Judd, et al., Plant Systematics

  6. © W. S. Judd, et al., Plant Systematics

  7. Phylogenetic data • Morphology • Secondary chemistry • Cytology • Allele frequencies • Protein sequences • Restriction sites • DNA sequences } “Molecular” data

  8. Molecular (genetic) data • Proteins • Serology (immunoassay) • Isozymes (electrophoretic variants) • Amino acid sequences • DNA • Structural (translocations, inversions, duplications) • Restriction sites • DNA sequences • Substitutions • Insertions/Deletions

  9. What are genes? From Raven et al. (1999), Biology of Plants

  10. Genomes • All of the genes within a cell are the genome • Genes located in the nucleus are the nuclear genome • Other genomes (organellar) • Mitochondrion: mitochondrial genome • Chloroplast: plastid genome

  11. nucleus chloroplast mitochondrion From Raven et al., 1999, Biology of Plants

  12. Comparison of Genomes

  13. Structural rearrangements Inversion Crossing over, duplication, and loss From Freeman and Herron (1998), Evolutionary Analysis

  14. Chemistry of Genes • DNA • Parallel strands linked together • Linear array of units called nucleotides • Phosphate • Sugar: deoxyribose • One of four bases • Adenine (“A”) • Cytosine (“C”) • Guanine (“G”) • Thymine (“T”)

  15. From Raven et al. (1999), Biology of Plants

  16. DNA structure • Paired strands are linked by bases • A must bond with T • G must bond with C • Each link is composed of a purine and a pyrimidine • A & G are purines • C & T are pyrimidines

  17. DNA function • DNA is code for making proteins (and a few other molecules) • Proteins are the structures and enzymes that catalyze biochemical reactions that are essential for the function of an organism • DNA code is read and converted to protein in two steps • Transcription: DNA is copied to messenger RNA • Translation: messenger RNA is template for protein

  18. DNA code • A gene is a code composed of a string of nucleotide bases (A’s, C’s, G’s, T’s) • A protein is composed of a string of amino acids (there are 20) • How does the DNA code get translated into protein?

  19. DNA code • Each amino acid is coded for by at least one triplet of nucleotide bases in DNA • Each triplet is called a codon • There are 64 possible codons (4 bases, 3 positions = 43)

  20. From Raven et al. (1999), Biology of Plants

  21. DNA functional classes • Coding • Proteins (exons) • Ribosomes (RNA) • Transfer RNA • “Non-coding” • Introns • Spacers

  22. From Raven et al. (1999), Biology of Plants

  23. Homology in Molecular Systematics • Assess orthology • Align sequences • Homology is often implicit (is this a good thing?)

  24. DNA Sequences and Homology • Homology: similarity due to common descent • How do we assess homology of DNA sequences? • Levels of homology • Locus • Allele • Nucleotide position

  25. From W. P. Maddison (1997), Systematic Biology 46:527

  26. Orthology vs. Paralogy • DNA sequences that are at homologous loci are orthologous • DNA sequences that are similar due to duplication but are at different loci are paralogous • Orthology may be best detected with a phylogenetic analysis of all sequences

  27. From Martin & Burg (2002), Systematic Biology 51:578

  28. Multiple Sequence Alignment • Goal: create data matrix in which columns are homologous positions • Problem: sequences vary in length • Why? • Insertions • Deletions

  29. Simple Sequence Alignment Taxon 1 GTACGTTG Taxon 2 GTACGTTG Taxon 3 GTACGTTG Taxon 4 GTACATTG Taxon 5 GTACATTG Taxon 6 GTACATTG

  30. Simple Sequence Alignment Taxon 1 GTACGTTG Taxon 2 GTACGTTG Taxon 3 GTACGTTG Taxon 4 GTACATTG Taxon 5 GTACATTG Taxon 6 GTACATTG

  31. DNA Sequence Data Matrix

  32. Slightly Less Simple Sequence Alignment Taxon 1 AGAGTGAC Taxon 2 AGAGTGAC Taxon 3 AGAGTGAC Taxon 4 AGAGGAC Taxon 5 AGAGGAC Taxon 6 AGAGGAC

  33. Slightly Less Simple Sequence Alignment Taxon 1 AGAGTGAC Taxon 2 AGAGTGAC Taxon 3 AGAGTGAC Taxon 4 AGAG-GAC Taxon 5 AGAG-GAC Taxon 6 AGAG-GAC

  34. Alignment Gaps • Gaps are inserted to maximize homology across nucleotide positions • Gaps are hypothesized indels • Inserting a gap assumes that an indel event is a better explanation of the differences among sequences than nucleotide substitution

  35. Taxon 1 AGAGTGAC Taxon 2 AGAGTGAC Taxon 3 AGAGTGAC Taxon 4 AGAGGAC Taxon 5 AGAGGAC Taxon 6 AGAGGAC AGAGTGAC AGAGTGAC AGAGTGAC AGAG-GAC AGAG-GAC AGAG-GAC 3 substitutions 0 indels 0 substitutions 1 indels

  36. Ambiguous Alignment with a Single-Base Indel Taxon 1 GGTCAG Taxon 2 GGCCAA Taxon 3 AGCTAA Taxon 4 AGCAA Taxon 5 AGCAA Taxon 6 AGCAA

  37. Ambiguous Alignment with a Single-Base Indel Taxon 1 GGTCAG GGTCAG Taxon 2 GGCCAA GGCCAA Taxon 3 AGCTAA AGCTAA Taxon 4 AG-CAA AGC-AA Taxon 5 AG-CAA AGC-AA Taxon 6 AG-CAA AGC-AA 4 substitutions 1 indels 4 substitutions 1 indels

  38. Gap Number and Length • All else being equal, is it better to assume fewer longer gaps, or more shorter gaps? • In other words, what is more likely: • For a new indel to occur? • For an existing indel to lengthen? • There is no general answer! • Alternate alignments are explored algorithmically

  39. Alignment Algorithms • Typically built up from pairwise alignments, using assumed gap costs • Problem: most algorithms require an initial tree to define alignment order--bias • Solution: simultaneous tree estimation and alignment optimization • Problems: costly, unjustifiable parameters

  40. Clustal Alignment Algorithm • Creates alignment based on penalties for gap opening (number of gaps) and gap extension (gap length) • Multiple alignment built according to guide tree determined by pairwise alignments • Order of adding sequences determined by a guide tree

  41. Clustal Alignment Algorithm Distance matrix calculated from pairwise comparisons Dendrogram calculated from from distance matrix Additional sequences are added according to dendrogram, until all sequences are added Alignment calculated for most similar pair of sequences, based on alignment parameters

  42. Tree-Based Alignment • Simultaneous tree and alignment estimation using parsimony • TreeAlign • MALIGN • Implement similar gap opening/extension costs • These applications are very slow!

  43. Alignment in the Future? • Incorporate a more sophisticated understanding of molecular evolution in parameterization • For example, what are realistic values of gap costs? Are they universal? • Can phylogeny estimation proceed without optimizing alignments? • Likelihood based methods can sum over all alignments • Will require major contribution of biologists

  44. Methods of tree estimation • Character based • Maximum parsimony (MP) • Fewest character changes • Maximum likelihood (ML) • Highest probability of observing data, given a model • Bayesian • Similar to ML, but incorporates prior knowledge • Distance based • Minimum distance • Shortest summed branch lengths

  45. Major classes of data Character-based Distance-based

  46. Minimum Distance

  47. Maximum Parsimony 3, 5 are slightly more complicated... 2: C 1: A 4: G

  48. Parsimony Criterion j = character N = number of characters w = character weight diff (x1, x2) = number of steps along branch • L = tree length • = topology k = branch B = number of branches

  49. Parsimonious Character Reconstruction • To evaluate the parsimony of a tree, each character is optimized (then the sum is computed) • Several parsimony algorithms have been developed that optimize character reconstructions • Algorithms differ in assumptions about permissible transformations between character states

  50. Likelihood Criterion • L = tree likelihood • = topology j = character (site) l = site likelihood

More Related