540 likes | 550 Views
Explore new bioinformatic tools for comparative structural RNAomics to understand RNA functionality through structure stability, sequence/structure conservation, and structural cis-elements. Discover the potential pseudoknot in the purine riboswitch aptamer and analyze compensatory mutations. Utilize three approaches for structural RNA comparative analysis, including homologous RNA sequences, crystallography/NMR, and MFE prediction. Witness stability, sequence conservation, and structural elements to unveil RNA function.
E N D
Michal Ziv-Ukelson New Tools for Comparative Structural RNAomics
Bioinformatic Structural witnesses for RNA functionality Witness 1: Structure Stability. Witness 2: Sequence/Structure Conservation. (within the structural context). Witness 3: Structure Conservation.
Structural Cis-Elements: Purine Riboswitch “GGUAU” “CCGUA” GGUAU [Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer. CCGUA
Witness 1: Stablity of Structure (2D, predicted) AUCCCCGUAUCGAUC AAAAUCCAUGGGUACCCUAGUGAAAGUGUA UAUACGUGCUCUGAU UCUUUACUGAGGAGU CAGUGAACGAACUGA RNA Secondary Structure Prediction: O(N3): [Nusssinov-Jacobson 1980, Zuker-Stiegler-1981] MFOLD:http://www.rpi.edu/~zukerm Vienna RNA Package:http://www.tbi.univie.ac.at/~ivo/RNA
Witness 2: Sequence Conservation (e.g in binding sites) Lactobacillus acidophilus Lactobacillus delbrueckii GGUAU GGUAU CCGUA CCGUA
Witness 3: Compensatory Mutations (in stems) Lactobacillus acidophilus Lactobacillus delbrueckii G-U U-A
Witness 3: Compensatory Mutations (in stems) Lactobacillus acidophilus Lactobacillus delbrueckii G-C C-G
Three Approaches to Structural RNA Comparative Analysis Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction Sequence alignment T-coffee Clustalw Prm A B C Sankoff locaRNA, foldAlign dynAlign, Carnac pmcomp Aligned Sequences Simultaneous Fold and Alignment Homologous RNA secondary Structures Fold alignment RNAalifiold Pfold ilm Structure Alignment RNAforester maRNA Aligned Structures
Approach A to Structural RNA Comparative Analysis [Giegrich-2004] Homologous RNA sequences Sequence alignment T-coffee Clustalw Prm A Witness 3 :Sequence Conservation… But without the Structural Context !!! A C G T G G A G A A C G G A C C C T A A A G G G G A T A T A G C A A T T A T C C G G A T T A G T T C C G G A T T G G A C G A A T A G G G C T A A A T G C C A .Witness 2:Structural Conservation Aligned Sequences Fold alignment RNAalifiold Pfold ilm Witness 1:Structure Stability. Aligned Structures
Approach A to Structural RNA Comparative Analysis [Giegrich-2004] Homologous RNA sequences Sequences need to be similar enough so that they can be initially… aligned Yet sequences should be dissimilar enough for co-varying substitutions ! to be detected Sequence alignment T-coffee Clustalw Prm A A C G T G G A G A A C G G A C C C T A A A G G G G A T A T A G C A A T T A T C C G G A T T A G T T C C G G A T T G G A C G A A T A G G G C T A A A T G C C A Aligned Sequences Fold alignment RNAalifiold Pfold ilm Aligned Structures
Three Approaches to Structural RNA Comparative Analysis Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction Sequence alignment T-coffee Clustalw Prm A B C Sankoff locaRNA, foldAlign dynAlign, Carnac pmcomp Aligned Sequences Simultaneous Fold and Alignment Homologous RNA secondary Structures Fold alignment RNAalifiold Pfold ilm Structure Alignment RNAforester maRNA Aligned Structures
Approach C to Structural RNA Comparative Analysis [Giegrich-2004] Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction C Machine Learning Homologous RNA secondary Structures Structure Alignment RNAforester maRNA Aligned Structures
Approach C to Structural RNA Comparative Analysis [Giegrich-2004] AUCCCCGUAUCGAUC AAAAUCCAUGGGUACCCUAGUGAAAGUGUA UAUACGUGCUCUGAU UCUUUACUGAGGAGU CAGUGAACGAACUGA Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction C Machine Learning Witness 1:Structure Stability Witnesses separated to two stages (can’t consult) !!! Homologous RNA secondary Structures Structure Alignment RNAforester maRNA R R M M Witnesses 2: Structural Conservation Witnesses 3: Sequence Conservation within the structural context). H B I B H Aligned Structures H H H
The problem Target RNA sequence Structure not known Consider top-ranking suboptimal folding predictions Query RNA known Sequence\structure
Outline • Previously: RNA folding Now: RNA search • RNA’s structure representations • Approaches to Tree Comparisons • Algorithm for Approximate Labelle Subtree Isomorphism\Homeomorphism • Results
Non Coding RNA Families • They are only partially conserved in sequence, but they are conserved in structure. • Have a role in regulating gene expression. • tRNA, rRNA, snoRNA, microRNA, siRNA, Riboswitch Structure Function
Our Goal Genome Sequence millions of nucleotides QUERY ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG …………………………………………… …………………………………………… Are there any appearances of this structure in the genome? Discover ncRNA templatess in a sequence database.
QUERY Example: Purine Riboswitch family consensus from RFAM Database (Seed133, Full 2,427)
The tool - STRMS (Structural RNA Motif Search): Input:(1)Secondary structure of the query, including local sequence and structure constraints, and (2) a target sequence database. Output: All occurrences of the query in the target, ranked by their similarity to the query [in html file]. The tool is flexible and takes into account a large number of sequence options. Our approach combines: pre-folding with MFOLD (Zuker, 2003) RNA pattern matching algorithm [O(mn)] based on subtree homeomorphism for ordered, rooted trees.
Isana Veksler Lublinsky Veksler-Lublinsky, I., Ziv-Ukelson, M., Barash, D., & Kedem, K. (2007). A structure-based flexible search method for motifs in RNA. Journal of Computational Biology, 14(7), 908-926.
RNA’s Secondary Structure (((((((..((((…….)))).(((((…….)))))…..(((((…….))))))))))))
Comparison of ordered rooted trees • Trees are among the most common and well-studied combinatorial structures in computer science. In particular, the problem of comparing trees occurs in several diverse areas such as: • computational biology • structured text databases • image analysis • automatic theorem proving • compiler optimization.
What is a labeled tree? a b f g c d Tree – a connected acyclic graph Each node in a labeled tree is assigned a label from a certain alphabet
Tree Matching - Grammar A simple parse tree:
RNA’s Secondary Structure Pseudoknot Stem Interior Loop Single-Stranded Bulge Loop Junction (Multiloop) Hairpin loop Image– Wuchty
Ordered rooted tree Shapiro, 1988: • The nodes correspond to elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). • The edges correspond to base-paired (stem) regions. Zhang, 1998: • The nodes of the tree represent either unpaired bases (leaves) or paired bases (internal nodes). Each node is labeled with a base or a pair of bases, respectively. • Two kinds of edges, alternatively connecting either consecutive stem base-pairs or a leaf base with the last base-pair in the corresponding stem.
Our tree representation • Compressed as in [Shapiro, 1988] + a node for every single strand component in multiloops. • Includes additional information on nodes and on edges for the purpose of sequence analysis. • It is more informative than Shapiro’s tree representation and more compact then Zhang’s. • This leads to a precise screening of the target text by first selecting candidates whose structural tree representation is similar to that of the query, and then further filtering these candidates by applying sequence considerations.
Alignment (Mapping) Properties e e a a b d d Preservation of ancestors
Mapping Aspects: rooting e a a d e d g f f g
Ordered Rooted Tree comparison • The following operations are defined on ordered trees: • relabel - Change the label of a node v in T. • delete - Delete a non-root node v in T with parent v′, making the children of v become the children of v′. The children are inserted in the place of v as a subsequence in the left-to-right order of the children of v′. • insert - The complement of delete. Insert a node v as a child of v′ in T making v the parent of a consecutive subsequence of the children of v′.
1. Edit distance • An edit script S between T1 and T2 is a sequence of edit operations turning T1 into T2. • The tree edit distance problem is to compute the edit distance and a corresponding edit script. (Edit script in Tree Comparison corresponds to generating the actual alignment in Sequence Comparison).
2. Tree Inclusion T1 is included in T2 if there is a sequence of delete operations performed on T2 which makes T2 isomorphic to T1. The tree inclusion problem is to decide if T1 is included in T2.
2. Tree Inclusion T1 is included in T2 if there is a sequence of delete operations performed on T2 which makes T2 isomorphic to T1. The tree inclusion problem is to decide if T1 is included in T2.
Polynomial time algorithms exist for these problems. They are all based on the classical technique of dynamic programming and most of them are simple combinatorial algorithms.
Comparison of ordered rooted trees • Ordered tree comparison is generally computed by tree edit distance, which allows various forms of deletions and insertions in both query and target. • The search for small non-coding RNAs naturally yields a more specific tree search formulation since we do not allow deletions in the query. • In our method we apply a weighted pattern matching algorithm for finding the best homeomorphic mapping between two rooted ordered trees. • Specific constraints on the searched structure can be defined in the input to the search: structural constraints (lengths), allowing or forbidding element deletion in the target, sequence constraints ( local conserved sequence segments, etc).
The Algorithm • Thesubtree isomorphism problem [Matula, 1968,1978]: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P, i.e. find if some subtree of T that is identical in structure to P can be obtained by removing entire subtrees of T, or decide that there is no such tree. • Thesubtree homeomorphism problem[Chung, 1987, Reyner, 1977, Pinter et al., 2004]: Is a variant of the former problem, where degree-2 nodes can be deleted from the text tree. Homeomorphism Example
The Algorithm - Motivation • Point-mutation events could easily result in an extra bulge in an RNA structure. • However, in some cases the functional homology to the original, non-mutated structure is still preserved. • The suggested alignment should be flexible enough to allow the deletion of degree-2 nodes from the target tree. bulge riboswitch and its functional homologue