1 / 1

Genome-Wide SNP Discovery from de novo Assemblies of Pepper ( Capsicum annuum ) Transcriptomes

Mean=1095. Max=19,089. Background and Significance . Results . Molecular breeding of pepper ( Capsicum spp. ) has been hampered by the paucity of molecular markers. This is primarily due to lack of availability of the pepper genome sequence and limited available sequence resources.

chinara
Download Presentation

Genome-Wide SNP Discovery from de novo Assemblies of Pepper ( Capsicum annuum ) Transcriptomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mean=1095 Max=19,089 Background and Significance Results • Molecular breeding of pepper (Capsicum spp.) has been hampered by the paucity of molecular markers. • This is primarily due to lack of availability of the pepper genome sequence and limited available sequence resources. • In recent years with the more cost effective sequencing technologies such as Illumina, sequencing of expressed genes (transcriptomes), gene discovery and allele mining is no longer insurmountable. • In order to exploit the speed and scale of data from new sequencing technologies and in an effort to enrich the sequence resources of pepper, we sequenced transcriptome sequences (RNA-seq) of three pepper lines: Maor, Early Jalapeño (EJ) and Criollo de Morelos 334 (CM334). • We selected a wide range of tissues to represent as many expressed genes as possible. • The reference sequence was constructed from >200 million Illumina reads (80-120 nt) using a combination of Velvet, CLC and CAP3 software packages. • BWA (Li and Durbin, 2009), SAMtools (Li et al, 2009b) and in-house Perl scripts were used to identify SNPs among three pepper lines. • The SNPs were filtered to be 100 bp apart from any putativeintron-exon junctions as well as adjacent SNPs. After filtering >22,000 high quality putative SNPs were identified and bioinformatically mapped to pepper genetic maps. • The reference sequence was annotated by Blast2Go software (Conesa et al, 2005). • Assembly Statistics • No. of Contigs Total nt N50 • CM334 83,113 84,792,180 1,488 • Early Jalapeño 82,614 84,973,865 1,488 • Maor76,375 79,383,673 1,526 • Pepper assembly123,261 135,019,787 1,647 • (CM334,EJ and Maor) Maor Early Jalapeño CM334 • SNP discovery • A total of 22,863 putative SNPs within 11,869 contigs were identified by our SNP discovery pipeline. • The contigs with identified putative SNPs comprised 23,794 kb (17.6%) of pepper transcriptomes assembly. • On average 1 SNP per 1040 bp of exonic regions of pepper genome was identified. Trimmed reads Min 40nt – Max 85nt Trimmed reads Min 25nt – Max 60nt Trimmed reads Min 40nt – Max 85nt Trimmed reads Min 25nt – Max 60nt Trimmed reads Min 40nt – Max 85nt Trimmed reads Min 25nt – Max 60nt N50=1647 Min=265 31 35 41 31 35 41 Velvet Assembler CLC Assembler Velvet Assembler CLC Assembler Velvet Assembler CLC Assembler Trimmed reads Min 40nt – Max 85nt Trimmed reads Min 40nt – Max 85nt Trimmed reads Min 40nt – Max 85nt 31 35 41 31 35 41 31 35 41 31 35 41 Velvet K-mers Trimmed reads Min 25nt – Max 60nt + + + Trimmed reads Min 25nt – Max 60nt Trimmed reads Min 25nt – Max 60nt Fig. 2 Distribution of contig length in pepper transcriptome assembly All K-mer assemblies, assembled with CAP3 • Annotation • A total of 63,202 contigs (51.3%) had at least one hit in the non-redundant database of GenBank with an average length of 1,495 nucleotides. • Contigs with a hit, covered 94.5 M bases (70%) of the total assembly. • A total of 60,055 (48.7%) contigs that did not have any hit in the GenBank were on average 674 nucleotide long and covering40.5 M bases (30%) of the total assembly. • Based on all results of BLASTX, Vitis vinifera, Arabidopsis thaliana and Oryza sativa were the top three species in the blast hits (Fig 3). • Mapping step of Blast2GO resulted in identification of 37,918 (30.7%) contigs with Gene Ontology (GO) terms. • Biological Processes (BP) at different GO levels were generated. Fig 4 shows the BP at level 2. For each BP number of annotated sequences are shown in Fig 5. • Kegg maps for 150 biological pathways were generated and contigs within each pathways were determined. For instance, Fig 6 depicts Kegg map of Pyrimidine Metabolism pathway. All K-mer assemblies, assembled with CAP3 All K-mer assemblies, assembled with CAP3 One iteration of CLC assembly with all reads One iteration of CLC assembly with all reads One iteration of CLC assembly with all reads CM334 assembly made with CAP3 Early Jalapeño assembly made with CAP3 Maor assembly made with CAP3 Objectives • To obtain as many transcribed genes as possible, peppers were sampled from different, cultivars, tissues at multiple stages of growth and development. • To discover putative SNPs among three sampled pepper cultivars by sequencing transcriptomes using Illumina Genome Analyzer. • To annotate the transcriptome sequence in order to have an insight into pepper biological processes. • To use annotated genes for QTL analysis and candidate gene discovery. Pepper final assembly made with CAP3 (Reference Sequence) Fig. 4 Fig. 3 Fig. 5 Materials and Methods • Plant Materials and cDNA Library Preparation • The seed of three pepper (C. annuum) lines‘CM334,’ ‘Maor’ and ‘Early Jalapeño’ were planted. • Three cDNA libraries (one from each pepper variety) were prepared using pooled RNA that was extracted from 4 tissues: root, young leaf, flower and fruit using Qiagen RNeasy Mini Kit (Qiagen Valencia CA, USA). Fruit tissues were collected in different developmental stages; 5, 10, and 20 days post pollination developing fruit, breaker and ripe fruit. • The libraries were constructed by shearing cDNAs and 300‐350 bp fragments were selected on gels. • The libraries were normalized using a double-stranded nuclease protocol. • The cDNA libraries were sequenced using Illumina Genome Analyzer IIx (GAIIx) (Illumina Inc., San Diego, CA) for 80-120 cycles at UC Davis Genome Center core facility. • De Novo Assembly of NGS Sequences • The NGS data (GAIIx) went through our standard preprocessing pipeline, developed at UC Davis (Kozik, A, 2010). • Velvet (Zerbino and Birney, 2008) and CLC (CLCBIO, 2010) software packages were used to assemble the sequences. • CAP3 was used to make the final assembly of three assemblies. Genome-Wide SNP Discovery from de novo AssembliesofPepper (Capsicum annuum ) Transcriptomes Hamid Ashrafi1, Jiqiang Yao2, Kevin Stoffel1, Sebastian R. Chin-Wo3, Theresa Hill1, Alexander Kozik3 and Allen Van Deynze1 1Department of Plant Sciences, Seed Biotechnology Center, University of California, Davis, CA 95616 2 Interdisciplinary Center for Biotechnology Research (ICBR), University of Florida, Gainesville, FL 32610 3 Genome Center, University of California, Davis, CA 95616 Fig. 6 Kegg map of Pyrimidine Metabolism Conclusions • Assembly of transcriptomes of three pepper cultivars, increased the total assembled bases by 50%. • The present pepper transcriptome assembly represents ~4% of pepper genome (3500 Mb). • We demonstrated that for the plants for which the genome sequences are not available yet, the transcriptome assembly is an alternate approach SNP calling. • Annotation of 51% of contigs or 70% of total assembled bases indicates that ~49% of contigs are small contigs that are covering the remaining 30% of unannotated sequences. References • Conesa, A., S. Götz, et al. (2005). "Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research." Bioinformatics 21(18): 3674-3676. • Kozik, A (2010).Tool to process and manipulate Illumina sequences).http://code.google.com/p/atgc-illumina/downloads/list”). • Li, H. and R. Durbin (2009). "Fast and accurate short read alignment with Burrows–Wheeler transform." Bioinformatics 25(14): 1754-1760. • Li, H., B. Handsaker, et al. (2009). "The Sequence Alignment/Map format and SAMtools." Bioinformatics 25(16): 2078-2079. • Zerbino, D. and E. Birney (2008). "Velvet: algorithms for de novo short read assembly using de Bruijn graphs." Genome Res 18: 821 - 829. Fig. 1 De Novo assembly of pepper transcriptomes • SNP Discovery Pipeline • BWA was used to map all the reads of three genotypes individually to the Pepper final transcriptome assembly. • SAMtools was use to make the pileups of each cultivar and discover the difference within each cultivar with reference sequence. • Indels were screened out of pileup files. • Intron-exon junction positions were inferred in the reference sequence based on Arabidopsis gene models using intron finder of Solanaceae Genome Network website (SGN). • In-house Perl scripts were used to create allele call table of all three genotypes, the SNPs were filtered against adjacent SNPs and identified Intronic regions. • Sequences surrounding the SNPs (100 base on each side) were extracted from the reference sequence to design assays. • Annotation of Reference Sequence • Blast2Go program was used to annotate the reference sequence, obtain the statistics and generate Kegg maps(http://www.genome.jp/kegg/pathway.html). Acknowledgments • The authors would like to thank EnzaZaden, Nunhems, Rijk Zwaan, Syngenta, Vilmorin and UC Discovery program for the financial support. We also would like to thank sequencing facility of UC Davis Genome Center and Bioinformatics core facility to provide us the servers and computational power. The annotation would not be possible without collaboration with Dr. R Michelmore’s laboratory.

More Related