180 likes | 317 Views
Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007. Predicting RNA structure. Existing RNA folding algorithms (mfold, sfold, pfold, Dynalign) determine the locations of cWW helices.
E N D
Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007
Predicting RNA structure • Existing RNA folding algorithms (mfold, sfold, pfold, Dynalign) determine the locations of cWW helices. • Internal loops, hairpin loops, and junctions are represented as “bulges” or unstructured areas between these helices. • Many of these “bulges” have stable 3D structures that in many cases allow the whole molecule to carry its function. • Many long-range interactions in the same RNA molecule, or interactions between RNA and other molecules occur at these locations. • If we can determine the structures of these areas, we can target them with drugs, and we can better understand their mechanisms and functions...
Predicting 3D structure of RNA loops • RNA loops are mostly made of non-WC BPs. • These non-WC BPs are less common than helical WC BPs, but they still make a good portion (ca. 1/3). • To complicate things, the non-WC BPs are not all homogeneous, instead they belong to any of a dozen or so geometric types*. • As a results, their 3D-structures eluded computational prediction for so long. *Leontis & Westhof, RNA 2001, v7: 499-512
Isostericity-based structure prediction • Comparative Sequence Analysis has been very successful in predicting cis WC BPs. • The problem with applying that onto non-WC BPs is that their allowed patterns of substitutions are more diverse and less obvious than cis WC substitutions. • To some extent these patterns were not even known until recently* *Leontis, Stombaugh, & Westhof, NAR 2002, v30: 3497-531
ISFOLD: a small first stepto solve a big problem • ISFOLD looks in sequence alignments for patterns similar to the known isosteric substitution patterns of base pairs. • This similarity can be scored and ranked, and based on it predictions of individual BP occurrences (their types and locations) can be made. • Such structural predictions are, of course, as good as the sequence alignments are. • For the best results, the alignments should be highly accurate and large, but also divergent enough to show substitutions in places of interest.
CSA example from 5S rRNA • CSA looks in sequence alignments for canonical “mutual compensating mutations” (C=G, G=C, A–U, U–A, and G/U & U/G) that covary in two alignment positions (or columns). cWW cWW cWW ? ?
tWS cWS
ISFOLD predictions for the 5S rRNA BPs All Correct predictions!
ISFOLD predictions for the whole motif Note: #26 is 75%A & 25%C
Summary of all 5S ISFOLD predictions Also: discovered 2 mistakes in original classification of BPs from crystal structure
ISFOLD can also use mutation data (Viable and lethal mutations determined experimentally) (An example from viroids)
Predicting Loop E motif in Viroids Published model* Without mutation data With mutation data (My viroids alignment is low quality) *Zhong et al, J Virol 2006, v80: 8566-81
Conclusions • This software predicts not just cWW BPs, but all types of BPs from sequence alignments. • ISFOLD does 2 tasks: • Predicts which 2 nucleotides are interacting to form a BP (location). • Predicts which specific type of interaction is most probably formed. • Good results when based on good alignments (5S rRNA). • Results dramatically improved when mutation data is used. • The higher the quality of the alignment, the better the predictions. • By good alignment quality I mean 3 things: • Large number of sequences • Enough variability between sequences • But not too much variability that might mean complete change of the 3D motif (motif swap).
ISFOLD:Mission is NOT accomplished • ISFOLD as it is now is only the beginning, it provides a framework that can be added upon in the future… • One thing to consider is that RNA recurrent motifs (such as internal and hairpin loops) occur as whole units – groups of individual BPs tend to occur together.
ISFOLD:Mission is NOT accomplished • ISFOLD as it is now is only the beginning, it provides a framework that can be added upon in the future… • One thing to consider is that RNA recurrent motifs (such as internal and hairpin loops) occur as whole units – groups of individual BPs tend to occur together. • When a good structural library of observed recurrent motifs becomes available (like SCOR database), ISFOLD could be modified to study whole motifs at once (specific stacks of BPs can be scored together)