20 likes | 166 Views
Annotation of Genome Scaffold no. 2 for Gram Positive Obligate Anaerobic Bacterium, Clostridium taeniosporum . Kalf, Nick § , Florit D. § , Hunicke-Smith, S. ¥ , Satterwhite E. ¥ , Blinkova A . ¥ , Walker J.R . ¥ , Tucker H. ¥ , Leon A.J . § ,
E N D
Annotation of Genome Scaffold no. 2 for Gram Positive Obligate Anaerobic Bacterium, Clostridium taeniosporum.Kalf, Nick§, Florit D.§, Hunicke-Smith, S.¥, Satterwhite E.¥, BlinkovaA.¥, Walker J.R.¥, Tucker H.¥, Leon A.J.§, Cambridge J.M.¥ and Ginés-CandelariaE.§ §Miami Dade College-Wolfson Campus, Department of Natural Sciences, Health & Wellness, Miami, FL 33132 ¥University of Texas at Austin, School of Biological Sciences, Section of Molecular Genetics Austin TX 78712 INTRODUCTION RESULTS Clostridium taeniosporum is a Gram-positive, obligate anaerobic, nonpathogenic rod-shaped bacterium that is closely related to Clostridium botulinum Group II strains. C. botulinum Group II produce the most deadly bacterial neurotoxins. This collaboration generated the annotation of scaffold two of the C. taeniosporum genome. The annotation has been crucial in assembling the complete annotated genome, which will give insights on the biochemical functions of protein-encoding genes involved and the regulation of such genes. Annotation of scaffold 2 of C. taeniosporum revealed multiple transferase enzymes, such as the metabolic Phosphotransferase system EIIC (Figure 1), S-adenosyl-L-methionine-dependent methyltransferase, cell wall biosynthesis glycosyltransferaseand sn-glycerol-3-phosphate acyltrasferase. The bioinformatics software Geneious Pro v6.1.5, and the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information (NCBI) were utilized to provide a complete annotation of scaffold 2. After having successfully annotated the eighteen scaffolds we will organize an assembly of the complete genome. Further genome studies with C. taeniosporum will clarify its relationship to C. botulinum group II strains and or elucidate novel functions particular to the organism and to the group. Figure 2. Percent of completion of the annotation of Clostridium taeniosporum with respect to time. MATERIALS&METHODS DISCUSSION Bioinformatics tools were used to predict and annotate possible coding sequences and additional features such as components of the transfer RNA (tRNA) and ribosomal RNA (rRNA) machineries. The software platform, Geneious Pro.v.6.1.5 was used to predict possible Open Reading Frames (ORFs) and annotate C. taeniosporum’s genome. A total of eighteen scaffolds of various sizes were obtained by pyrosequencing chemistry utilizing high-throughput 454 FLX-titanium sequencing technology. Possible ORFs were determined using the parameters of a start codons (ATG, CTG and TTG) and a minimum of 100 base pairs. Once ORFs were predicted, the BLAST tool from the NCBI database was used for predicting possible coding sequences by aligning the predicted ORFs with known sequences of complete microbial genomes in the GenBank. These predictions were used to create annotations of the corresponding features found in Geneious Pro.v6.1.5. Predictions have been categorized as coding sequences (CDS), ribosomal RNA (rRNA), transfer RNA (tRNA), Gene (Gen), Nucleotide Deletions and miscellaneous features (Misc-f), which includes noncoding sequences that extend from the 5’ end to the 3’ end of specific coding sequences. Non-sequenced stretches that did not contain any nucleotides (N’s) were annotated as conflict. We have successfully completed a 100% annotation of scaffold 2 of the non- pathogenic bacterium C. taeniosporum with the aid of the software program Geneious Pro v6.1.5 and the NCBI BLAST tool. It is important however, to note that not all the ORFs predicted by Geneious represent actual genes. Consequently, when such sequences are analyzed by BLAST, no features are predicted. This resulted because Geneious may predict series of ORFs corresponding to overlapping coding regions of a larger gene, or because there are no corresponding annotation in the microbial GenBank. These overlapping ORFs that are matched to single genes were annotated as one feature (see figure 1). Our data also contains non-sequenced nucleotide sections in a predicted ORF. These nucleotides are often displayed as “N” instead of A, T, G or C. These sections are annotated as “conflict areas.” Table 1 shows the number of annotated features in each scaffold. This number does not include these repeated features, miscellaneous features or conflict sequences. It only includes non-repeated annotations of coding sequences and components of the tRNA and rRNA machineries. However, included among the annotated coding sequences can be found a high number of predicted hypothetical proteins. These are proteins predicted from nucleic acid sequences and protein sequences with unknown function (Lubecet al., 2005). Annotated hypothetical proteins represent approximately 20% of the total number of annotated features from all eighteen scaffolds. This number shows that, although this process has been able to provide a complete annotation of Clostridium taeniosporum, there is the need for more information gathering for this bacterial genome. Figure 2 shows a time progression on the annotation of the C. taeniosporum genome, which indicates the time span on which the annotation process occurred. We have successfully completed the annotation of scaffolds 1-18, which represent 100% of the 97.7% the genome coverage. Presently, the translation of the annotated coding sequences using NCBI BLASTp tool, or more commonly known as protein BLAST will be used for validation of the predicted proteins, thus utilizing protein homology searches for coding sequences identified via nucleotide homology. RESULTS High throughput 454-pyrosequencing reactions resulted in 3,452. 850 Kbs representing the C. taeniosporum’s genome, divided among 18 scaffolds of different sizes. Scaffolds 1-18, encompassing 3,452,850 Kbps and containing 12,173 ORFs, have been already analyzed. This represents 97.7% of the genome. A total of 2,837 features have been annotated in these 18 scaffolds. The annotation of the 528,830 bps Scaffold 2, reported here, contains 1,796 ORFs. Table 1 shows the distribution of the eighteen scaffolds, their nucleotide lengths, the percentage that they represent from the total genome, the number of predicted ORFs, and the number of annotated features in each of them. Using Geneious Pro v6.1.5 to predict possible Open Reading Frames (ORFs), we annotated 832 features (Table 1) such as the metabolic Phosphotransferase System EIIC, S-adenosyl-I-methionine-dependent methyltransferase, cell wall biosynthesis glycosyltransferaseand sn-glycerol-3-phosphate acyltrasferase (Figure 1). To date, a 100% of the 97.7% genome coverage has been annotated, which in turn, completes the annotation of the C. taeniosporum’sgenome. REFERENCES/ACKNOWLEDGMENTS Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol. 215: 403-410. Geneious Pro v.6.1.5 created by Biomatters. Lubec, G., Afjehi-Sadat, L., Yang, J.W., Paul, J., and Pradeep, J. (2005) Searching for hypothetical proteins: Theory and practice based upon original data and literature. Elsevier WeahringerGuertel 18-20, A-1090, Vienna, Austria. I would like to thank the National Science Foundation-Advanced Technological Education Program “The Biotechnology Research Learning Collaborative” NSF ATE DUE 0802508, and the US Department of Education HIS-STEM Program STEM-TRAC PO31C110190, for their support. Many thanks to my colleagues Alfredo Leon, Edwin Gines-Candelaria, Jose Thompson, Brittany Gonzalez and Kevin Vidal for their support and encouragement. Figure 1. Annotated view of a portion of scaffold 2. Coding sequences (yellow), miscellaneous features (grey), Confict (red), Phosphotransferase system EIIC (green).