1 / 30

Geoff Morris*, Paul Grabowski, Justin Borevitz Dept. of Ecology and Evolution

Genomic diversity and population structure in switchgrass, Panicum virgatum : Genotyping-by-sequencing and population genomics. Geoff Morris*, Paul Grabowski, Justin Borevitz Dept. of Ecology and Evolution University of Chicago. Genomic diversity and population structure.

shel
Download Presentation

Geoff Morris*, Paul Grabowski, Justin Borevitz Dept. of Ecology and Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski, Justin Borevitz Dept. of Ecology and Evolution University of Chicago

  2. Genomic diversity and population structure • Geographic patterns of genomic diversity reflect: drift, migration, and adaptation • Genomic diversity: nucleotide variation and insertions/deletions across many loci in the nuclear and organellar genomes. • Leads to design of mapping populations for quantitative genetics and molecular breeding

  3. Genomic diversity and natural history Example: Pitcher plant mosquito (Wyeomyiasmithii) Emerson et al. PNAS 2010

  4. Ecotypic diversity in switchgrass • Switchgrass and other wide-ranging grassland species have many ecotypes • Great variability in size, shape, color, and habitat preference • Example: Upland/lowland divergence Lowland (Oklahoma) Upland (Michigan) Adapted to: Shorter growing season, Drier climates Adapted to: Long growing season, Wet climates

  5. Effects of ecotype diversity of productivity • Three year plot (6m2) experiment at Fermilab • ~20% overyield in switchgrass mixtures compared to monocultures

  6. “Genomic diversity and population structure in switchgrass, Panicum virgatum: from the continental scale to a dune landscape” Morris, Grabowski, and Borevitz Accepted, Molecular Ecology

  7. Biogeography of Indiana Dunes flora Boreal flora: e.g. Jack Pine, Bearberry Great Plains flora: e.g. Sandreed, Little Bluestem Coastal Plain flora: e.g. Seaside spurge, Marramgrass Eastern deciduous flora: e.g. Tulip tree Recolonized post-glaciacation: ~10,000 years ago

  8. Switchgrass gene pools ? Zhang et al. 2011

  9. Landscapes in Indiana Dunes • Landscape features are dynamic and can be dated: • 100s – 1000s of years for dunes • 10s – 100s of years for blowouts Big blowout ~ 150 years old

  10. Study questions • Can switchgrass population structure be confirmed with a genome-wide sample of non-ascertained markers? • In a hierarchical sample of switchgrass, how much diversity is there on a landscape, regional, and continental scale? • Did multiple switchgrass gene pools contribute to the Indiana Dunes populations? • Is there genomic diversity in a single landscape feature (blowout)? • Is there local (private) genetic diversity in the Indiana Dunes?

  11. Switchgrass plant samples • Switchgrass cultivated varieties (cultivars) • Kanlow (Oklahoma - lowland) • Blackwell (Oklahoma - upland) • High Tide (Maryland - Coastal) • Forestburg and Sunburst (South Dakota) • Dacotah (North Dakota) • Cave-in-Rock (Illinois) • Southlow (Southern Michigan “ecopool”) • Indiana Dunes switchgrass • Big Blowout • Jack pine savanna • Interdune

  12. Problems with traditional markers systems • Locus sampling: • Typically only a few kb are sequenced in a few loci (rDNA, cp introns) • Large stochastic error and loci-specific bias • e.g. Plant chloroplast has 100X lower rate of evolution than animal mitochondria • Ascertainment bias: • Occurs whenever markers are discovered and typed separately • Worst when ascertainment panel is geographically restricted subpopulation • e.g. Inferred genetic diversity in Africans is spuriously low when when European markers are used

  13. Genomic diversity from de novo sequencing = restriction site 1) PstI digest of genomic DNA 2) End-polish, blunt-end ligation; Illumina barcodes 3) PCR amplify and pool fragments from multiple samples 4) Assemble and map reads to “stacks” and call SNPs • Reduced representation +multiplexing = more samples • 10,000+ candidate SNPs • No reference genome needed • Data here from 76 or 100bp paired end reads • 40 billion base pair data set

  14. Plastome sequence in RRLs 1) PstI digest of genomic DNA, with star activity and random shearing • Nuclear whole genome shotgun sequence is too light (<<1X) for assembly • Plastome WGS is very high (>>1X) 2) End-polish, blunt-end ligation

  15. Analysis of chloroplast data • Chloroplast genome sequence (plastome) included in data • Random (shotgun) sequence + 20 PstI sites • Switchgrass chloroplast reference available (Upland and Lowland) • Mapped reads to both ~140,000 base pair chloroplast genomes • Coverage (# of times each position is read): 1X – 786X

  16. Chloroplast coverage and polymorphisms Chloroplast Genome Coverage Position (kb)

  17. Chloroplast phylogeny • Neighbor joining tree based on 140kb • Named haplogroups have >50% bootstrap • Unfilled lines indicate low-coverage sample

  18. Chloroplast phylogeny

  19. Chloroplast phylogeny

  20. Population analysis of nuclear loci • Create “pseudoreference” of RRL loci with de novo assembly • Map reads to pseudoreference to create stacks (150-1500 reads) • Map reads to switchgrass chloroplast and sorghum mitochondria, and drop stacks that match organelles • Select single-nucleotide variants that: • Have high sequence quality (PHRED score < 0.001 for both alleles) • Vary in frequency across samples (chi-square < 0.01) • Are nearest to restriction site, closest to beginning of read • Randomly select one allele per sample (weighted by observed frequency)

  21. Coding sequence variation in the chloroplast • 77 coding genes in chloroplast (including Rubisco, ribosome, etc) • 60kb of coding sequence • Constraints in non-synonymous (NS) vs. synonymous (S) variation provides biological validation for SNPs • Upland vs. Lowland (~1 million years): • 23 NS : 16 S (ratio = 1.4) • Within upland ( < 0.5 millions years) • 16 NS : 3 S (ratio = 5.3)

  22. Nuclear genome: Multidimensional scaling ~11000 nuclear loci, mean of 100 random allele samples

  23. Nuclear loci: Structure analysis Bayesian clustering algorithm ~11000 nuclear loci, random allele sample, Burn-in 10K, Run 10K

  24. Conclusions • Confirmed upland vs. lowland differentiation and differentiated a local population using non-ascertained markers • Lake Michigan switchgrass is distinct from broader upland population in midwest and Great Plains. • Post-glacial gene flow into the Indiana Dunes included genotypes from across the Great Plains and Midwest • The chloroplast diversity in the Indiana Dunes did not evolve in the current midwestern population, but originated one or more glacial cycles ago • A single blowout in the dunes can have as much chloroplast diversity as the Midwest

  25. New GBS methods for population genomics • For true population analysis we need 10+ individuals in multiple populations • Illumina multiplexing is too expensive – separate prep cost for each library adds $100s/sample • Read count overdispersion (up to ~200X more Poisson) requires technical replicates to even counts • Sticky-end ligation increases specificity and removes random sequence (including plastome)

  26. Genotype-By-Sequencing (GBS) Based on Elshire et al. 2011, PlosONE

  27. GBS on continental + dunes switchgrass

  28. New population genomic studies with GBS • Continental population structure (126 individuals) • 50/50 deep diversity and shallow diversity based on chloroplast markers and SSRs • Tetraploid cultivars (24 each for TX, OK, NE, ND cultivars) • Ploidy differences may be confounded with genetic diversity • High sample size should allow traditional pop gen analyses (Fst etc...) • Dune half-sibs (4 mothers and 10 offspring each) • True SNPs will segregate in the offspring while homeologous substitutions will not

  29. Bioinformatics overview • No software package for population genomic analysis on GBS • Stacks (U. Oregon) comes closest but multinomial sampling model expects high frequency SNPs (e.g. mapping population) • Buckler lab TASSEL package (Java) may be appropriate • We’ve been using custom pipeline (CLC, MySQL, R) for analysis • http://create.ly/gefxsub43

More Related