180 likes | 549 Views
Modeling RNA motifs by graph-grammars François.Major@UMontreal.CA. www.iric.ca. MC-Tools: Functions. ( MC-Annotate 3-D ) -> graph ( MC-Cycles graph ) -> [ NCM ] ( MC-Seq graph ) -> [ sequence ] ( MC-Fold sequence ) -> [ graph ] ( MC-Cons [ ( sequence, [ graph ] ) ] ) -> [ graph ]
E N D
Modeling RNA motifs by graph-grammarsFrançois.Major@UMontreal.CA www.iric.ca
MC-Tools: Functions • ( MC-Annotate 3-D ) -> graph • ( MC-Cycles graph ) -> [ NCM ] • ( MC-Seq graph ) -> [ sequence ] • ( MC-Fold sequence ) -> [ graph ] • ( MC-Cons [ ( sequence, [ graph ] ) ] ) -> [ graph ] • ( MC-Search ( graph, [ 3-D ] ) -> [ 3-D ] • ( MC-Sym graph ) -> [ 3-D ]
MC-Tools: Objects(rat 28S rRNA sarcin/ricin stem-loop) Nucleotide cyclic motifs: ( MC-Sym graph ) -> [ 3-D ] Graph: 3-D structure: ( MC-Fold sequence ) -> [ graph ] Szewczak et al. PNAS(USA) 1993 Lemieux & Major NAR 2006 Parisien, Thibault & Major (in prep.) Sequence: GGGUGCUCAGUACGAGAGGAACCGCACCC
Graph ( MC-Annotate 3-D ) -> graph Gendron, Lemieux & Major JMB 2001 Lemieux & Major NAR 2002 Leontis & Westhof RNA 2001
X4 Y1 C4 C5 C2 X3 Y2 X2 C3 C1 Y3 X1 5’ 3’ Shortest Cycle Basis ( MC-Cycle graph ) -> [ NCM ] Horton SIAM J Comp 1987 St-Onge et al. NAR 2007
The Nucleotide Cyclic Motifs (NCM) • Embrace indistinctly all base pairing types (Watson-Crick and others) • Precisely designate how any nucleotide in the sequence relate to others • Are joined through a common base pair (context). This helps us predict coherent chains of NCMs and to project them in 3-D. Tentative definition of a motif: “ordered” chain of NCMs. • Recur within and across all RNAs • Are short (< 10 nts; most of 3 to 5 nts) • Compose the classical motifs (cf. GRNA tetraloop; sarcin/ricin motif, etc). There are exceptions (cf. AA platform). Lemieux & Major (2006) NAR34:2340 Parisien, Thibault & Major (in prep.)
Aim We want a computational model that can encode the valid sequences and structural features of RNA motifs. Hypothesis: A relation between the sequence and the structure of RNA motifs exists.
Graph Grammars • A graph grammar is to a set of graphs what a formal generative grammar is to a set of strings, i.e. a precise and formal description of that set. • A graph-grammar consists of a set of rules or productions for transforming graphs. • Formally, a graph-grammar, H = {N, , P}, consists of a set of non‑terminal symbols, N, a set of terminal symbols, , and a set of production rules, P. Hypothesis: NCMs are “independent” building blocks. Nagl Computing 1976 Nagl In H. Ehrig et al., eds 1987 St-Onge et al. NAR 2007
⇒ ARNt levure 23S H. marismortui 16S E. coli ⇒ Sarcin/Ricin Graph Grammar ⇒ N = {C1, C2, … C5}, the set of NCMs: = {S1, S2, … S5} the sets of sequences for each NCM: P is a set of consistent assignment of the sequences in to the NCMs in N (production rules): St-Onge et al. NAR 2007
G A A G U U A A A A A G A U U G A U A Sarcin/Ricin Building Blocks C3 : Theoretical : 64 (16 x 4) IMs : 56 (14 x 4) PDB : 2 C4 : Theoretical : 256 (16 x 16) IMs : 160 (16 x 10) PDB : 3 C5 : Theoretical : 64 (16 x 4) IMs : 40 (10 x 4) PDB : 8 C1 : Theoretical : 256 (16 x 16) IMs : 120 (10 x 12) PDB : 7 C2 : Theoretical : 64 (16 x 4) IMs : 40 (10 x 4) PDB : 5 Theoretical : 16 IMs : 10 PDB : 15 St-Onge et al. NAR 2007
( MC-Seq sarcin-ricin-graph ) -> [ sequence ] Sequences supported by the NCMs in the PDB: AGUA-GAA AGUA-AAA GGUA-GAA GGUA-AAA If we remove the instances of the sarcin/ricin motifs ( MC-Search ( sarcin-ricin-graph, [ PDB ] ) ) -> [ 3-D ] Then, the same four sequences are supported => NCMs are found outside the sarcin/ricin context Larose et al. (in prep.) St-Onge et al. NAR 2007
Graph Grammar Parsing 806 sequences aligned according to E. coli 23S rRNA structure; site 204-207 / 189-191. Westhof (personal comm.) St-Onge et al. NAR 2007
Validation(MC-Seq vs. PDB vs. Alignment) Isostericity matrices MC-Seq PDB GGUA-AAA AGUA-AAA AGUA-GAA GGUA-GAA 10 000 sequences AAUA-AAA AAUA-GAA ACUA-AAA ACUA-GAA ACUA-GAC AGUA-AAC AGUA-CAA AGUA-GAC AGUA-GAU AGUA-GCC AGUA-GGG AGUA-GUG AGUC-GAA AUUA-GAA CGUA-GAA GAUA-GAA GGUA-GAU GUUA-GAA UGUA-GAA UGUA-GAC Alignement: 5S, 16S, 23S St-Onge et al. NAR 2007
Perspectives • We want to develop a version of MC-Seq that would be useful during the alignment process. • PDB does not seem to contain enough structural information yet. • To avoid too many sequences, the NCMs (context) are necessary. • Two more things need to be considered…
Sarcin/Ricin(Sequence/Structure Space Is Not Simple) St-Onge et al. (in prep.)
Modeling In 3-D Might Be Necessary MC-Fold CAUU-AAG (2.1Å) Alignment AUUA-GAA (0.9Å) St-Onge et al. NAR 2007
Acknowledgments Martin Larose (Res. assistant) Philippe Thibault (Res. assistant) Patrick Gendron (Res. assistant) Romain Rivière (Postdoc, CS) Véronique Lisi (Ph.D. Molecular Biology) Marc Parisien (Ph.D. Computer Science) Emmanuelle Permal (Ph.D. Bioinformatics) Karine St-Onge (Ph.D. Computer Science) Louis-Philippe Lavoie (M.Sc. Bioinformatics) Maxime Caron (M.Sc. Bioinformatics) Caroline Louis-Jeune (M.Sc. Bioinformatics) Montréal: Pascal Chartrand Gerardo Ferberye Sylvie Hamel Sébastien Lemieux Pascale Legault Luc Desgroseillers Kathy Borden Daniel Lamarre Éric Westhof (Strasbourg) Alain Denise (Paris) Dave Mathews (Rochester)