10 likes | 228 Views
Whole Genome (predicted) ORF Display:. Genome ORFs are displayed to allow interesting regions (rich in mobility genes, abnormal %G+C, close to structural RNAs) to be viewed in a genome context. E.g. H. Pylori 26695 Genome. %G+C Analysis General Observations:.
E N D
Whole Genome (predicted) ORF Display: Genome ORFs are displayed to allow interesting regions (rich in mobility genes, abnormal %G+C, close to structural RNAs) to be viewed in a genome context. E.g. H. Pylori 26695 Genome %G+C Analysis General Observations: High %G+C variance is associated with species with evidence of recent horizontal gene transfers (e.g. N. meningitidis). Low %G+C variance is associated with highly clonal species and species with no evidence of horizontal gene transfers (e.g. Chlamydia species, which are obligate intracellular microbes thought to have been ecologically isolated from other bacteria for a longer period than other obligate intracellular bacteria). %G+C variance is similar for single species, with the exception of the two V. cholerae chromosomes and two E. coli strains. However, chromosome II of V. cholerae appears to have originated from a megaplasmid captured by Vibrio5. For E. coli, pathogenic strain O175:H7 has higher %G+C variance. This might be due to the presence of PAI and other potentially horizontally transferred genetic elements. Several low %G+C regions can be seen in thegraphic display: = CAG island = region contains virB homologues; not present in strain J99 • IslandPath Graphical Display: • Each dot in a graphic corresponds to a predicted protein-coding ORF in the genome. Dot colours indicate if an ORF has a higher or lower %G+C than cutoffs you set (default settings are +/- 3.48* of the mean %G+C). You may click on a dot to view a portion of an annotation table presented below the graphic. • 3.48 = 1.5 S.D. of the mean for Chlamydia genomes, which are proposed to have undergone no recent horizontal gene transfer (data not shown). = plasticity zone (contain different genes for J99 and 26695) Detection of Known Pathogenicity Islands: Yersinia pestis strain CO92: High Pathogenicity Island core(in red rectangle) Mean: 47.9 STD DEV: 4.9 %GC S.D. Location Orientation Product 56.48 +1 2140840..2142861 - pesticin/yersiniabactin receptor protein 58.81 +2 2142992..2144569 - yersiniabactin siderophore biosynthetic protein 58.33 +2 2144573..2145376 - yersiniabactin biosynthetic protein YbtT 60.40 +2 2145373..2146473 - yersiniabactin biosynthetic protein YbtU 60.79 +2 2146470..2155961 - yersiniabactin biosynthetic protein 60.15 +2 2156049..2162156 - yersiniabactin biosynthetic protein 56.35 +1 2162347..2163306 - transcriptional regulator YbtA 57.29 +1 2163473..2165275 + lipoprotein inner membrane ABC-transporter 58.62 +2 2165262..2167064 + inner membrane ABC-transporter YbtQ 59.48 +2 2167057..2168337 + putative signal transducer 55.25 +1 2168365..2169669 + putative salicylate synthetase 52.65 2169863..2171125 - integrase Vibrio cholerae chromosome I: VPI (toxin regulated pili) VPI delineated as a stretch of low %G+C region flanked by mobility genes Detection of Proposed or Potential Genomic Islands: Horizontal Gene Transfer and Bacterial Pathogenicity: Several types of mobile elements have been shown to carry virulence factors: Escherichia coli O157:H7: Area displayed in white rectangle is ~ 28kb in size (from 3708kbp to 3736kbp) and contains Type III Secretion proteins Epr’s, Epa’s, and Eiv’s; and numerous hypothetical proteins with unknown functions Vibrio cholerae chromosome I: Transposons: ST enterotoxin genes in E. coli Prophages: Shiga-like toxins in EHEC Diptheria toxin gene Cholera toxin Botulinum toxins Plasmids: Shigella, Salmonella, Yersinia Pathogenicity Islands: Uro/Entero-pathogenic E. coli Salmonella typhimurium Yersinia spp. Helicobacter pylori Vibrio cholerae Area displayed in red rectangle is ~ 34kb in size (from 1896kbp to 1930kbp) and contains a tRNA-ser in the same orientation as the phage integrase downstream of it. The ORFs contain one putative helicase, one chemotaxis protein MotB-related protein, one putative type I restriction enzyme HsdR, one putative DNA methylase, one putative N-acetylneuraminate lyase, one C4-dicarboxylate-binding periplasmic protein, and numerous hypothetical proteins and conserved hypothetical proteins. tRNA when adjacent to an abnormal %G+C region is often observed to be in the same orientation as the stretch. This might be an artefact of phage insertion and excision events as 3’ end of tRNA are common phage attachment (att) sites. IslandPath: A computational aid for identifying genomic islands that may play a role in microbial pathogenicity William Hsiao1*, Nancy Price2, Ivan Wan3, Steven J. Jones3, and Fiona S. L. Brinkman1. 1Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, 2Department of Medical Genetics, University of British Columbia, Vancouver, and3Genome Sequence Centre, B.C. Cancer Agency, British Columbia, Canada www.pathogenomics.bc.ca/brinkman Abstract As more genomes from bacterial pathogens are sequenced, it is becoming apparent that a significant proportion of virulence factors are encoded in clusters of genes, termed Pathogenicity Islands (reviewed in 1). These islands and other genomic islands, tend to have atypical guanine and cytosine content (%G+C), contain mobility genes (e.g. transposases and integrases), and are associated with tRNA sequences. We have developed a web-based computational tool, IslandPath, to aid the visualization of these features in a full genome display in order to facilitate the identification of genes in new genome sequences that may be involved in virulence or have horizontal origins. The ability to visualize these features within the genomic context can facilitate better detection of the genomic island borders and neighbouring genes. Atypical %G+C by itself is not indicative of the horizontal origin of the sequence involved, however, the predictive power increases when such regions are associated with mobile elements, direct repeats, or contain genes with similarity to known virulence factors. Therefore, we are incorporating into IslandPath algorithms to detect partial tRNAs in new genomic sequences that are likely to be the reminiscent of phage insertion events, and are also comparing the genomic sequences to a custom-built database of a subset of known virulence factors. Preliminary results are encouraging through our investigation of the ability of IslandPath to visualize known Pathogenicity Islands as distinct regions within the genomes. This computational tool also permitted us to perform a more in-depth analysis of %G+C variance in genomes and enabled us to detect correlations not previously reported. As more and more genome data become available, tools like IslandPath, which can be updated in an automated fashion, will become valuable for genomic research. Frequencies of ORF %G+C in Genomes: %G+C Analysis for Complete Genome Sequences: Histograms of frequencies of %G+C were plotted for several organisms. Observations: Lowest kurtosis occurs most commonly with a mode of 33.33% for %G+C values of ORFs in a genome (e.g. M. jannaschii DSM2661) This G+C value corresponds to maximum A/T in synonymous sites for the standard codon usage table. Long tails in the frequency plots occur more frequently downward (e.g. H. pylori J99 and N. meningitidis) than upward These observations likely reflect either a bias in gene identification in high G+C genomes, or a selection to higher A+T content. • Discussion: IslandPath appears to be an effective automated tool to visualize and detect genomic islands. Previous reports have expressed concern about the use of %G+C to detect HGT; however, these reports were examining %G+C for individual genes. We propose that %G+C analysis is effective if clusters of genes containing motifs associated with mobility elements are considered. Foreign genes with similar %G+C to the organism’s genome are not detected, and due to gene amelioration, only “recent” HGT can be detected. This tool represents one approach that can be complemented with others, to prioritize particular genomic islands that merit further research. • Future developments: • Virulence factor homology search (based on comparison to our VGS dataset) • Alternative DNA signatures (e.g. codon usage) • Allow users to input their own sequences for analysis Methods: Core scripts written in Perl and CGI/Perl Sequence Data: NCBI Genome FTP site Potential mobility elements: COG analysis2,3 plus keyword scan RNA locations: NCBI data plus tRNAscan-SE4 %G+C calculated for each ORF Mean and Std. Dev. for all ORFs in genome calculated File containing all ORF information used to generate a graphical representation Virulence Gene Subset (VGS) database developed through literature analysis of genes identified as virulence factors using the “Molecular Koch’s Postulates” (i.e. gene knockout affects virulence) References 1 Hacker J and Kaper JB, 2000, Annu Rev Microbiol. 54:641-79 2 Tatusov RL, et al., 1997, Science 278(5338):631-7 3 Tatusov RL, et al., 2001, Nucleic Acids Res. 29(1)22-8 4 Lowe TM and Eddy SR, 1997, Nucleic Acids Res. 25(5):955-64 5 Heidelberg JF, et al., 2000, Nature 406:477-84 Acknowledgements This project is funded by the Peter Wall Institute for Advanced Studies.We wish to thank Tatiana Tatusov of NCBI for providing helpful files for IslandPath and acknowledge the efforts of the many genome projects that have made our analysis possible.