200 likes | 322 Views
Bioinformatics Computational methods to discover ncRNA in bacteria. Ulf Schmitz ulf.schmitz@informatik.uni-rostock.de Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de. Outline. Problem description Streptoccocus pyogenes The RNome, transcriptome
E N D
BioinformaticsComputational methods to discover ncRNA in bacteria Ulf Schmitz ulf.schmitz@informatik.uni-rostock.de Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de
Outline • Problem description • Streptoccocus pyogenes • The RNome, transcriptome • Characteristics of bacterial ncRNA • Approaches to find fRNA • Conclusion / Outlook Ulf Schmitz, Computational methods to discover ncRNA
pyoderma (source: DermNet NZ) pharyngitis (source: UCSD) Streptococcus pyogenes • important human pathogen (group A streptococcus or GAS) • causes following diseases: • pyoderma (111 million cases/year) • pharyngitis (616 million cases/year and 517,000 deaths/year) • completely adapted to humans as it’s only natural host • causes purulent infections of the skin and mucous membranes and rarely life-threatening systemic diseases Ulf Schmitz, Computational methods to discover ncRNA
Streptococcus pyogenes • varies in multiplication rate -> associated with type of infection • to understand the regulation, one studied the growth-phase regulatory factors and gene expression in response to specific environmental differences within the host • a novel growth phase assosiated two-component-type regulator was identified • fasBCAoperon, present in all 12 tested M serotypes • contained two potential HPK genes (FasB, FasC) and one RR (FasA) • shows its maximum expression and activity at the transition phase • and to potentially support the aggressive spreading of the bacteria in its host HPK = Histidine protein kinase RR = response regulator Ulf Schmitz, Computational methods to discover ncRNA
Streptococcus pyogenes • downstream of the fas operon they identified a ~300 nucleotide transcript (fasX) • not encoding for a peptide/protein • but also growth phase related • main effector molecule of fas regulon • ncRNA or fRNA Ulf Schmitz, Computational methods to discover ncRNA
tt pfas 1kb ncRNA fasX gltX-L fasB fasC fasA rnpA-L tt pfasX prnpA Ulf Schmitz, Computational methods to discover ncRNA
RNome or transcriptome putative gene expression regulators (also protein interaction – and housekeeping ncRNAs where found) Ulf Schmitz, Computational methods to discover ncRNA
RNome or transcriptome types of RNA: Non-coding RNA (ncRNA) genes produce functional RNA molecules rather than encoding proteins and here are the nominees: Ulf Schmitz, Computational methods to discover ncRNA
Functions of ncRNA …target mRNAs via imperfect sequence complementarity • binding may result in: • blockage of ribosome entry • (translation repression) • melting of inhibitory • secondary structures • (translation activation) dissolving fold the fold back structure loop-loop kissing complex Ulf Schmitz, Computational methods to discover ncRNA
Streptococcus pyogenes genomes Genome Info & Features: Ulf Schmitz, Computational methods to discover ncRNA
Intergenic sequence inspector (ISI) Bacterial genomes database IGR databank Filtered IGR databank BLAST results Sequence features Annotated genome Aligned features Final results IGR extractor IGR filtering BLAST BLAST Analyser Genview Ulf Schmitz, Computational methods to discover ncRNA
Characteristics of bacterial ncRNA • intergenic sequence/structure conservation between related • genomes • encoded by free-standing genes, oriented in opposite • fashion to both flanking genes • 50 to 400 nt long (avrg. >200nt) • higher G+C content than average intergenic space • σ70 promoter • ρ – independent terminator • imperfect sequence complementary with target mRNA Ulf Schmitz, Computational methods to discover ncRNA
intrinsic terminator Promotor Startpoint -35 -10 5-9bp CA90T T82T84G78A65C54A45 16-19bp T80A95T45A60A50T96 Characteristics of bacterial ncRNA Ulf Schmitz, Computational methods to discover ncRNA
The structure approach with RNAz Function of many ncRNAs depend on a defined secondary structure • multiple sequence alignment • measure of thermodynamic stability (z score) • measure for RNA secondary structure conservation Ulf Schmitz, Computational methods to discover ncRNA
The structure approach Thermodynamic stability • calculation of the MFE (minimum free energy) as a measure of thermodynamic stability • MFE depends on the length and the base composition of the sequence • and is therefor difficult to interpret in absolute terms • RNAz calculates a normalized measure of thermodynamic stability by • compares the MFE m of a given (native) sequence • with the MFEs of a large number of random sequences with similar length and base composition. • A z-score is calculated as , where µ and σ are the mean and standard deviations, resp., of the MFEs of the random samples • negative z score indicates the a sequence is more stable than expected by chance Ulf Schmitz, Computational methods to discover ncRNA
The structure approach Structural conservation • RNAz predicts a consensus secondary structure for an alignment • results in a consensus MFE EA • RNAz compares this consensus MFE to the average MFE of the individual sequences Ē and calculates a structure conservation index: • SCI will be low if no consensus fold can be found. Ulf Schmitz, Computational methods to discover ncRNA
The structure approach • z-score and SCI, are used to classify an alignment as “structural RNA” or “other”. • RNAz uses a support vector machine (SVM) learning algorithm which is trained on a set of known ncRNAs. Ulf Schmitz, Computational methods to discover ncRNA
Analysis pipeline of Freiburg group extraction of intergenic regions ≥50nt BLASTN local alignment of IGRs with BLASTN E-value ≤10-8 no discard reverse complement of candidate sequences to reduce redundancy Unify overlapping using ClustalW Clustering using RNAz Scoring Ulf Schmitz, Computational methods to discover ncRNA
Summary / Conclusion • there are ‘reliable’ computational methods to find ncRNA coding genes in bacteria • key methods involve: • IGR extraction and filtering • observing sequence conservation in related genomes (BLAST search, ClustalW alignment) • checking for structure conservation and thermodynamic stability • next step is to proof their existance experimentally via microArrays or Northern Blots Ulf Schmitz, Computational methods to discover ncRNA
Outlook • might it be possible to predict target mRNA? Thanks for your attention! Ulf Schmitz, Computational methods to discover ncRNA