1 / 24

Sequence data are being used to address specific questions.

Application of Next Generation Sequencing to Horituclutre: Next Generation Sequencing of the Tomato Transcriptome. David Francis, Allen Van Deynze , John Hamilton, Matt Robbins, Sung-Chur Sime, Walter De Jong, David Douches, and C. Robin Buell.

imogenee
Download Presentation

Sequence data are being used to address specific questions.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of Next Generation Sequencing to Horituclutre: Next Generation Sequencing of the Tomato Transcriptome David Francis, AllenVan Deynze, John Hamilton, Matt Robbins, Sung-Chur Sime, Walter De Jong, David Douches, and C. Robin Buell Supported by the AFRI Plant Breeding, Genetics, and Genomics Program of USDA’s National Institute of Food and Agriculture

  2. Sequence data are being used to address specific questions. International Sol Project: How do a common set of genes give rise to such a wide range of morphologically and ecologically distinct organisms? SolCAP: How can variation be harnessed to improve varieties that benefit the consumer, processors, and the environment? Resources: Draft genome for doubled monoploid DM1-3 516R44 (S. tuberosum L. Phureja group); Heinz 1706 (S. lycopersicum) and LA1589 (S. pimpinellifolium); Technology Next Generation Sequencing SNP genotyping

  3. SolCAP’s contribution: GAII sequencing of transcribed sequences (transcriptomes) S. tuberosum (3 varieties) Premier Snowden Atlantic S. lycopersicum (6 varieties and accessions) OH9242 FL7600 NC84173 OH08-6405 PI 114490 PI 128216

  4. SolCAP’s contribution: GAII sequencing of transcribed sequences (transcriptomes) S. tuberosum (3 varieties) Premier Snowden Atlantic S. lycopersicum (6 varieties and accessions) OH9242 Processing FL7600 NC84173 OH08-6405 PI 114490 Cherry PI 128216 (S. pimpinellifolium) TA496 (E6203) Processing Heinz 1706 Processing LA 1589 (S. pimpinellifolium) } FM

  5. Library creation/QC GAII sequencing (single and paired end) 400 300 Data Collection Assembly Analysis: transcriptome complexity SNP calling/validation identification of genes under selection

  6. Tomato Illumina GA II Output

  7. Velvet Assemblies of TomatoIllumina Sequences • Purity Filtered (PF) and a contig length of >150bp:

  8. Sequence quality: Viewing FL7600 potato contig from the Velvet assembly

  9. Alignment of contigs relative to DM1-3 516R44 FL7600 (93.7 % id; 94.4 % coverage) Snowden (97.9; 94.7)

  10. Identify SNPs A/C SNP

  11. Filtered SNP counts Filtering on SNP quality and 1 SNP/ 150bp window No. SNPs Validation rate depth of coverage

  12. Validation and SNP summary by germplasm class 28,380 SNPs, overall validation rate 97% 9215 SNPs with 10,000 probes for infinium 6,500 SNPs within cultivated 5,600 among processing 4,000 among fresh market 3,500 among vintage 3,750 PM within S. pimpinellifolium 2,760 PM between LA716 and M82

  13. Analyses: Direct comparison of sequence Ka/Ks, alternative splicing, etc… Analysis of SNPs across populations FST outlier analysis Genotyping platforms: BeadXpress (48-384 SNPs) Infinium (7,600 SNPs)

  14. What patterns do we expect to see for genes “under selection”? • Low Variation (fixed) • High Ka/Ks (mutations affect protein, possible diversifying selection) • Mutations (loss of function) • FST (genes that distinguish populations)

  15. Population structure: coding vs. non-coding Processing Fresh-market Vintage Landrace All 173 markers (K=6) CA & OH OH CN 89 Coding markers (K=5) 84 Non-coding markers (K=6) CA OH OH CN 500K burnin/750K MCMC reps, 20 runs for each K from 3 to 8

  16. Distribution of FST for genes ovate: 0 fw2.2: 0 sp6: 0.14 ovate: 0.26 fw2.2: 0 sp6: 0.73 ovate: 0.14 fw2.2: 0.46 sp6: 0.05 ovate: 0 fw2.2: 0.5 sp6: 1 ovate: 0 fw2.2: 0.42 sp6: 0.74 ovate: 0.31 fw2.2: 0 sp6: 0.47

  17. Genes that have high FST between S. lycopersicum populations

  18. Genes that have high FST between S. lycopersicum populations

  19. Distribution of PM genes across populations is not random Processing Fresh Market Vintage Wild

  20. BioPerl NCBI SGN BLAST DOS Perl CygWin (Unix emulator) Cyc BioPerl BLAST UNIX Perl Next Generation Sequencing requires data management and “in house” pipelines for analysis and storage In-house database

  21. (2) WGS Scaffold GAII (Next Generation Sequence) PI 128216 H1706 Whole Genome Sequence Draft BLAST BLAST (1) Markers Linked to the Resistant gene 1.8 Mb, 172 SNPs, 132 pass Illumina Design Criteria; 60 unique genes

  22. Conclusions ~5.7 Gb PF potato transcriptome sequence (3 varieties) ~14.3 Gb PF tomato transcriptome sequence (6 varieties) Draft genomes currently available are excellent scaffolds for potato and tomato GAII transcriptome alignments. SNPs are not evenly distributed in genes/genomes Genes with signatures of selection (Ka/Ks; high FST) tend to be genes associated with response to abiotic and biotic stress and plant growth habit. Co-adapted complexes result from selection during plant breeding. Accelerated marker discovery. Lessons Learned: Control GAII Sequence of varieties used for draft genome sequence would permit bioinformatic optimization or pipelines rather than relying on empirical validation.

  23. Visit us at http://solcap.msu.edu/ Tools, Downloads

  24. Acknowledgments Collaborators, CAU Wencai Yang Collaborators, Beijing Genomics Inst. Sanwen Huang Collaborators, OSU Matt Robbins Sung-Chur Sim Troy Aldrich Hui Wang Collaborators, Cornell Walter de Jong Lucas Mueller Joyce van Eck Naama Menda Collaborators, UCD Allen Van Deynze Kevin Stoffel Alex Kozic UC Davis Genome Center Collaborators, MSU David Douches C Robin Buell John Hamilton Kelly Zarka Funding USDA/AFRI This project is supported by the Agriculture and Food Research Initiative of USDA’s National Institute of Food and Agriculture.

More Related