Plant Molecular Systematics

Plant Molecular Systematics Spring 2014

“Problems” with morphologicaldata… • Convergence and parallelisms • Reduction and character loss • Phenotypic vs. genotypic differences • Evaluation of homology • Misinterpretation of change or polarity • Limitation on number of characters • Phenotypic plasticity

Always searching for new types of characters… Is molecular data intrinsically better than morphological data?

Central Dogma

Central Dogma Lipid pigments: chlorophyll lycopenes xanthophylls carotene Iridoid compounds Terpenes Alkaloids (N-containing) e.g. nicotine caffeine morphine betalains Phenolics: flavonols flavones tannins anthocyanins Secondary Metabolites

Development of Molecular (Chemical) Systematic Methods“Chemosystematics” • Early methods relied on chromatography to separate complex mixtures of secondary metabolites, detect them, and then compare between taxa “spot botanists” – very phenetic • Better separation and identification methods developed – used pathway stages as cladistic characters - phytochemistry • Move away from secondary metabolites to proteins • Early protein studies used immunological reactions • Development of improved electrophoretic methods – permitted direct protein comparisons between taxa • Comparison of seed storage proteins • Development of direct estimates of genetic relationships based on allele frequency of enzyme variants

Molecular (DNA) Systematics • Next step was to examine DNA directly through examination and comparison of restriction fragments (RFLP bands) • Technology evolved to make it feasible to sequence DNA directly • Initially limited to single genes or non-coding regions • Now feasible to sequence large numbers of genes or regions or increasingly even whole genomes relatively quickly

Molecular Systematics • Can obtain phylogenetically informative characters from any genome of the organism- Assumes that genomes accumulate molecular changes by lineage, as morphological characters do- Possibly greater assurance of homology with molecular data (less likely to misinterpret characters) but homoplasy happens!- Principal advantages are the much greater number of molecular characters available & greater comparability across lineages

How big are genomes of organisms?

Genomes of the Plant Cell Plastid Nuclear Mitochondrial

Three genomes in plant cells Mitochondrion 200,000- 2,500,000 bp Generally maternally inherited (seed parent) Chloroplast 135,000- 160,000 bp Generally maternally inherited (seed parent) Nucleus 1.1 x 106 to 1.1 x 1011 kilobase pairs Biparentally inherited

Selection of DNA region to compare: • Should be present in all taxa to be compared • Must have some knowledge of the gene or other genomic region to develop primers, etc. • Evolutionary rate of sequence changes must be appropriate to the taxonomic level(s) being investigated; “slow” genes versus “fast” genes • Sequences should be readily alignable • The biology of the gene (or other DNA sequence) must be understood to assure homology

Genes frequently used for phylogenetic studies of plants: • Mitochondrial genome – uniparentally (maternally) inherited, but genes evolve very slowly and structural rearrangements happen very frequently, so generally not useful in studying relationships, but there are some exceptions • Plastid genome – uniparentally (maternally) inherited - rbcL – ribulose-bisphosphate carboxylase large subunit - ndhF – NADH dehydrogenase subunit F - atpB – ATP synthetase subunit B - matK – maturase subunit K - rpl16 intron – ribosomal protein L16 intron • Nuclear genome – biparentally inherited - ITS region – internal transcribed spacers ITS1 and ITS2 - 18S, 26S ribosomal nuclear DNA repeat - adh – alcohol dehydrogenase - many other genes now with next generation sequencing

Plastid Genome • Circular, derived • from endosymbio- • sis of cyanobacteria • Three zones: • LSC (large single • copy region) • SSC (small single • copy region) • IR (inverted repeats) • - Genes related to • photosynthesis and • protein synthesis Fig. 14.4

The Polymerase Chain Reaction (PCR) (Fig. 14.2)

Automated Sequencing Scanning of gel to detect fluorescently-labeled DNAs; data fed directly to computer.

Fig. 14.3

How do we analyze molecular variation? - DNA nucleotide sequences (point mutations)- Structural rearrangements -insertions and deletions (indels) -inversions

Aligned DNA sequences showing substitutions

Insertion-Deletion Events • - Can occur as single • nucleotide gains or losses • or as lengths of 2-many • base pairs • Can also be “chunks” of • DNA (i.e., losses of introns)

A molecular synapomorphy for Subfamily Cactoideae (Cactaceae) – deletion of the plastid rpoC1 intron… ancestral derived (Wallace & Cota, Current Genetics, 1995)

Cactaceae: trnL Intron Deletions

trnL intron deletions – Columnar Cacti North American Clades Pachycereeae Leptocereeae Hylocereeae Corryocactus “Browningieae I”* “Browningieae II”* - 268 bp Cereeae Shared Deletion 2 Trichocereeae South American Clades (*Tribe Browningieae polyphyletic)

Chloroplast DNA Inversion 23 kb inversion in all Asteraceae except for members of Tribe Barnadesieae (now Subfamily Barnadesioideae)

Fig. 14.6

Comparative DNA Sequencing • Obtain DNA samples from representative organisms (try to represent morphological diversity) and outgroups • Identify DNA region(s) for comparison • Extract DNA and use PCR to amplify targeted region • Carry out sequencing reactions • Run sequencing procedures (automated) • Align sequences • Use aligned sequences for phylogenetic analysis (various programs using various algorithms) • Evaluate data in context of taxonomy and morphology

Partial sequence of rbcL (plastid gene coding for Rubisco) in Poaceae

Anomochlooideae Pharoideae Puelioideae BEP Clade Bambusoideae (bamboos) Pooideae (bluegrasses, wheat) Ehrhartoideae (rices and allies) Aristidoideae (wiregrasses) Stamens reduced to 3; + 55 mya Panicoideae (maize, panicgrasses) Chloridoideae (love grasses) PACMAD Clade Danthonioideae (pampas grasses) Micrairoideae Arundinoideae (reeds) Crepet & Feldman 1991

Genetic Databases International Nucleotide Sequence Database Collaboration GenBank: National Institutes of Health (NIH) Genetic Sequence Database http://www.ncbi.nlm.nih.gov/genbank/ EMBL: European Bioinformatics Institute Nucleotide Sequence Database DDBJ: DNA Databank of Japan

Data mining Climatic Data -Global Biodiversity Information Facility (GBIF) -1,584,351 independent collection sites -10,469 taxa Edwards et al., Science 2010, Fig. 4 Genetic Data -2,684 taxa -8 regions (plastid and nuclear) -phylogenetic analysis Edwards & Smith, PNAS 2010, Fig. 1

Plant Molecular Systematics