230 likes | 354 Views
“Challenging” internal loop motifs. Ali Mokdad, M.D., Ph.D. Systematically finding internal loops. Problem with current automatic alignment methods.
E N D
“Challenging” internal loop motifs Ali Mokdad, M.D., Ph.D.
Problem with current automatic alignment methods • The state-of-the-art RNA automatic alignment methods are based on SCFG (covariance models) and do not systematically use all available 3D structural information for alignment. • The advantage of using SCFG is their capability to describe nested interactions (RNA 2D structures). • These methods as they are currently applied work best for helical W.C. segments, but do not produce accurate alignments in non helical segments or in areas where tertiary interactions occur. • With the ever growing library of accurate RNA 3D structures, it is now possible to use the 3D information to build better alignments.
Generation Known Structure UUAUCCAUGGCGUCGCACAAAGGC CAACAAAAAUAGUUCUGGGAGCAG Parsing
SCFG/MRF models • We use SCFG models that are capable of describing not only W.C. interactions, but also all other families of edge-to-edge interactions observed in 3D structures. • We program all isosteric subfamilies (figure below) into the SCFG to allow isosteric substitutions when aligning sequences. • We also combine SCFG with Markov Random Fields (MRF) models, allowing for the alignment of areas where local crossing interactions occur, or where multiple interactions with a common nucleotide take place. • SCFG/MRF are thus capable of generating clusters of bases at once (triples, quadruples, etc.), and are not limited to basepairs. • The hybrid SCFG/MRF is capable of detecting areas of motif swaps in the alignments from sequence data alone. • Eventually it may be possible to detect structural features of small motifs directly from sequence data.
Programs http://rna.bgsu.edu/FR3D • GUI ready, will be posted online within days • User manual sometime soon… • Appearing soon in J. Math. Biol http://rna.bgsu.edu/ribostral • MATLAB and compiled version (PC) available • Full manual available • Appearing soon in Bioinformatics
Ribostral • Full manual available … • Inputs: • Fasta alignment file • A list of interactions taken from a 3D structure
Score calculation: BP 26/22 in Bacteria: 26/22 is tWS CG in the crystal structure. There are: 312 sequences with isosteric (I) substitutions 25 heterosteric (H) substitutions 13 forbidden (F) substitutions Correction coefficient c = 100 / (3x351) = 0.095 Score = 0.095 x (3x312 – 25 – 2x13) = 83 Individual BP score = c x (3I + 2NI – H – 2F – 2G1 – 3G2) Where c is the correction coefficient: c = 100 /(3 x number of sequences)