1 / 1

A modified Smith-Waterman alignment 8 was performed on all the proteins in the mini-database

BioHealthBase: Complete Genome Annotation of Francisella tularensis subsp . tularensis A.II Wyoming (FTW) and Francisella tularensis subsp . holarctica (FTA)

tevin
Download Presentation

A modified Smith-Waterman alignment 8 was performed on all the proteins in the mini-database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioHealthBase: Complete Genome Annotation of Francisella tularensis subsp. tularensis A.II Wyoming(FTW) and Francisella tularensis subsp. holarctica (FTA) Jyothi M Noronha1, Shubhada Godbole1, Liwei Zhou3, Zuoming Deng3,9, Raymond K. Auerbach2,10, Stephen M. Beckstrom-Sternberg2, James S. Beckstrom-Sternberg2, Paul Gilna4, Paul S. Keim2,7, Kristy Kubota5, Yan Zhou5, Jeannine Petersen5, John V. Pearson7, Aihui Wang3, Jianjun Wang3, Jicheng Hao7, David M. Wagner2, Nancy Brown4,6, Christine Munk4,6, Xianying Wei3, David Bruce4,6,Thomas S. Brettin4, Michael P. Dempsey8, Edward B. Klem3, and Richard H. Scheuermann1 1BioHealthBase/University of Texas Southwestern Medical Center, Dallas TX, 2Northern Arizona University, Flagstaff AZ, 3BioHealthBase/Northrop Grumman Health Solutions, Rockville MD, 4Los Alamos National Laboratory, Los Alamos NM, 5Centers for Disease Control and Prevention, Fort Collins CO, 6The Joint Genome Institute, Department of Energy, Walnut Creek CA, 7Translational Genomics Research Institute, Phoenix AZ, 8Air Force Surgeon General Modernization Directorate, Falls Church VA, 9Agricultural Research Service, U.S. Department of Agriculture, Philadelphia PA, 10Yale University, New Haven CT ABSTRACT ANALYSIS & RESULTS • Fig 2: Annotation and Curation Pipeline • Predicted proteins were searched against a non-redundant amino acid database (nraa) that consists of all proteins available from GenBank, PIR and SWISS-PROT using the BLAST-Extend-Repraze (BER) algorithm and all significant matches were stored in a mini-database. • Using our completed genome annotation results, further analysis was performed at the Translational Genomics Research Institute, AZ,and a paper has been published on the ‘Complete Genomic Characterization of a Pathogenic A.II Strain of Francisella tularensis subspecies tularensis’ in PLos ONE. (PMID: 17895988) • The analysis showed that FTW strain has some of the following features: • Presence of 4,382 potential polymorphisms (3,367 SNPs and 1,015 indels) • 31 large chromosomal rearrangements (inversions and translocations) • One unique gene (FTW0888, hypothetical protein) that is absent in Schu 4 • FTW is found to be 5,657 bp longer than Schu S4 The BioHealthBase (BHB) (www.biohealthbase.org) is one of eight Bioinformatics Resource Centers (BRC) funded by the National Institute of Allergy and Infectious Diseases (NIAID). The BHB serves as an integrated information resource for access to both biological data and analysis tools for five priority pathogens, including the bacterial pathogen Francisella tularensis. Curators at BHB have been actively involved in the expert annotation of several newly sequenced bacterial genomes including the manual curation of the automated predictions of gene structure, location and function. Our existing curation process involves gene prediction using the Glimmer3 algorithm based on the Interpolated Markov Model, automated annotation of predicted genes with the AutoAnnotate pipeline and manual curation of genes using the Manatee interface developed at the J. Craig Venter Institute (formerly TIGR). Manual curation includes editing of start and stop sites, assignment of functional features such as gene symbols, EC number and GO annotations followed by the quality checks for data consistency and accuracy. An overview of Francisella genome annotation pipeline and manual curation process used by BHB curators is described at http://www.biohealthbase.org/gsearch/documents/Annotation_SOP.pdf. The complete genome sequences of Francisella tularensis Wyoming (FTW) strain and Francisella tularensis subsp. holarctica (FTA) are available at: http://eutils.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=134048946 and http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=CP000803, respectively. Results from curated genome annotation are being used for comparative analyses between F.t. tularensis, F.t. novicida and F.t. holarctica subspecies. Supported by NIH N01AI40041 • A modified Smith-Waterman alignment8 was performed on all the proteins in the mini-database • The gene was extended 300 nucleotides upstream and downstream of the predicted coding region to identify potential frameshifts, if any. • All protein sequences were searched against the Pfam9, HMMs and TIGRFAMs10 built from well-curated multiple alignments of proteins considered to share the same function or known to be members of the same family. • Prediction of transmembrane helices using TMHMM11, Signal peptide using signalP12 and lipoprotein motif and COG13 relationshipsof proteins based on phylogenetic classification of proteins encoded in complete genomes were also performed • The resultant data was sent to the AutoAnnotate pipeline at BHB to make functional predictions for proteins and were made available in the Manatee interface for manual evaluation of the AutoAnnotate- predicted function. INTRODUCTION The annotation of two complete genomes from the subtypes A and B has been completed by curators at BHB. The sequencing of F. tularensis subspecies tularensis WY96-3418 and F. tularensis subspecies holarctica (FTA) was initially performed at the JGI production genomics (The Joint Genome Institute) facility in Walnut Creek, CA. Genome finishing was performed by the JGI at Los Alamos National Laboratory, Los Alamos, NM. The complete genomes were then auto-annotated at BHB for gene descriptions and functional predictions of proteins and all predictions were manually evaluated to provide reliable structural and functional assignments. Quality checks and data integrity evaluations were performed prior to submission of the annotated genomes to GenBank. • Manual Curation Process • JCVI’s guidelines for functional assignments were used for assignment of Gene symbol, EC number and GO annotations. Pseudogenes were termed ‘transposon-disrupted ORFs’ if they lacked experimental evidence. • TIGRfam and Pfam HMM alignments along with Blast-Extend Repraze (BER) alignments were used as primary evidence for functional annotations, and TMHMM, SignalP, COG relationships as well as literature data was used as supporting evidence. Enzyme Commission14 (EC) numbers were assigned to predicted enzymes based on their homology to an enzyme family • ORFs other than programmed frameshift or stops codons were reviewed for correct translation and when required, fixed by editing the start sites. Additionally, BioHealthBase also provides researchers with pre-computed Ortholog predictions for proteins based on OrthoMCL15 calculations, a list of all the IEDB curated epitopes as well as Protein 3-D structures for viewing and analysis, obtained from Protein Data Bank (PDB). Comparative Genomics tools and Synteny viewers will soon be available to perform whole genome analysis of the genomes in BHB. METHODS Fig.1 BioHealthBase Annotation Pipeline for Francisella tularensis Gene Prediction • Genome annotation was performed at BHB using J. Craig Venter Institute’s (JCVI) Prokaryotic genome analysis pipeline for gene prediction (Glimmer31), automated annotation (AutoAnnotate2) and manual curation (Manatee2) • The genomic sequence was first reformatted to ensure that dnaA was the initial gene and co-ordinates were reassigned • The Glimmer3 (version 3.02) gene prediction algorithm which is based on Interpolated Markov Model was used with criteria imposed for presence of an initiation codon and specified length of ORF • The program tRNAscanSE3 (version 1.23) was employed to find tRNAs. Exonerate4 (version 1.0.0) was used to identify 16S and 23S ribosomal RNA genes BHB GENOMEANNOTATION STATISTICS REFERENCES Delcher et al. Nucleic Acids Research, 27(23):4636-4641 (1999) http://manatee.sourceforge.net/ Lowe, T. M. & Eddy, S. R. Nucleic Acids Research, 25:955-964 (1997) 4. Slater, G. S. & Birney, E. BMC Bioinformatics. 15: 6-31 (2005) 5. Griffiths-Jones, S., et al Nucleic Acids Research, 33:D121-D124 (2005) 6. http://www.tigr.org/software/genefinding.shtml 7. Altschul S., et al. J. Mol. Biol., 215: 403-410 (1990) 8. Smith T.F., et al. J. Mol. Biol.147(1): 195-197 (1981). 9. Bateman A., et al. Nucleic Acids Res. 28(1): 263-266 (2000). 10. Haft D., et al. Nucleic Acids Res. 29(1): 41-3 (2001). Krogh, A., et al. Journal of Molecular Biology, 305(3):567-580 (2001) http://www.cbs.dtu.dk/services/SignalP/ Tatusov, R.L., et al.Nucleic Acids Res. 28(1):33-6 (2000) http://www.chem.qmul.ac.uk/iubmb/enzyme/ OrthoMCL: http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi • RFam5 (release 7.0), a comprehensive database of non-coding RNA (ncRNA) families was used to identify genes coding for other non-coding RNAs, (such as 5S ribosomal RNAs) • Prediction of ribosome binding sites (RBS) was done using RBSfinder6 algorithm developed by JCVI (former TIGR)

More Related