300 likes | 408 Views
RNA 2D and 3D Structure. Craig L. Zirbel October 7, 2010. RNA primary sequences. Laboratory techniques make it possible to extract specific RNA molecules and determine the sequence of nucleotides. Here are the sequences of the 5S ribosomal RNA molecule from different organisms:.
E N D
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010
RNA primary sequences • Laboratory techniques make it possible to extract specific RNA molecules and determine the sequence of nucleotides. Here are the sequences of the 5S ribosomal RNA molecule from different organisms: UUAGGCGGCCACAGCGGUGGGGUUGCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGAGCCUCUGGGAAACCCGGUUCGCCGCCACC A H.m. (structure) GCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGC B E.coli (structure) UCCCCCGUGCCCAUAGCGGCGUGGAACCACCCGUUCCCAUUCCGAACACGGAAGUGAAACGCGCCAGCGCCGAUGGUACUGGGCGGGCGACCGCCUGGGAGAGUAGGUCGGUGCGGGG B T.th. (structure) AGUGGUGGCCAUAUCGGCGGGGUUCCUCCCCGUACCCAUCCUGAACACGGAAGAUAAGCCCGCCAGCGUCCGGCAAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCAC A L27170.1/1-120 GUAGCGGCCACAGCGGUGGGGUUCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGACCCUCUGGGAAACCGGGUUCGCCGCUAC A L27163.1/1-119 GCGGCCAGGGCGGAGGGGAAACACCCGUACCCAUUCCGAACACGGAAGUGAAGCCCUCCAGCGAACCAGCUAGUACUAGAGUGGGAGACCCUCUGGGAGCGCUGGUUCGCCGCC A L27343.1/3-116 UUUGGCGGUCAUGGCGUGGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A M36187.1/5-126 GUUGGCGGUCAUGGCGUGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A X62857.1/1-121 UUUGGCGGUCAUGGCGUGGGGGUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUGUUUUGCUGUGGGAAGCCCAUUUCACUGCCAGCC A X15364.1/6601-6721 GUCGGUGGUGUUAGCGGUGGGGUCACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCUGCCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGACCCCGCCGGCA B M16176.1/4-120 GUCGGUGGUUAUAGCGGUGGGGUCACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCCACCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCACCGCCGGCC B M16177.1/4-120 GUUGGUGGUUAUUGUGUCGGGGGUACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCCGAUUGCGCUGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCGCUGCCAACC B X55255.1/4-120 UACGGCGGUCAAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCAAUGAUACUGCCCUCACCGGGUGGAAAAGUAGGACACCGCCGAAC B X55259.1/3-117 UACGGCGGUCCAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUACCCAUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X55251.1/3-116 UACGGCGGCCACAGCGGCAGGGAAACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUGCCCCUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X75601.1/91-203 UAAGGCGGCCAUAGCGGUGGGGUUACUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCGCCUGCGUUCCGGUCAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCUACU A X03407.1/5927-6048 UUGGCGACCAUAGCGGCGAGUGACCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCUCGCCUGCGUUUCGGUCAGUACUGGAUUGGGCGACCCUCUGGGAAAUCUGAUUCGCCGCCACC A L27168.1/1-120 GGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCACCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCC A X02128.1/24-139 GGGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCGCCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCCCU A X14441.1/5-123
RNA can make double helices • RNA chains are flexible enough to fold back on themselves and make the same types of basepairs as are found in DNA. These are called “Watson-Crick” basepairs.
Watson-Crick basepairs • The main Watson-Crick basepairs are AU and GC. (GU also occurs sometimes.) They can substitute for one another freely without changing the structure of the RNA molecule. They are said to be isosteric, and changes between these basepairs is an example of neutral variability. They are held together by hydrogen bonds (dotted lines). Superposition
Comparative sequence analysis • By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule. • This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them. Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; http://www.rna.ccbb.utexas.edu UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU
Comparative sequence analysis • By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule. • This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them. Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; http://www.rna.ccbb.utexas.edu UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU ((((((((((-----((((((((----(((((((-------------)))))))))---(((((((((-(((((((--((((((((---))))))))--)))))))---))))))))))-
RNA 3D structure • Starting late in the year 2000, high-resolution atomic structures of entire ribosomes have been published. These show the bases, the backbone, the Watson-Crick basepairs, and several new types of basepairs. • The 3D structures confirm the predicted secondary structure and show the importance of Watson-Crick basepairs. E. coli 5S RNA
RNA secondary structure prediction • Now that we understand the basics of RNA 3D structure and Watson-Crick basepairs, we can pose the problem of predicting the secondary and 3D structure from an RNA sequence. • Comparative sequence analysis requires multiple RNA sequences. • For now, we will talk about predicting RNA secondary structure from a single sequence.
Three methods to predict secondary structure • Dot plots – a way to visualize the possible helices in a sequence. Somewhat primitive, but a good technique to know. • Nussinov algorithm – a technique to find the set of basepairs which maximizes the number of basepairs in a sequence • Energy methods – find the set of basepairs which results in the lowest energy structure, the one which is likely to be preferred in nature. mfold
c g u u u g g g u u c a c a a a C G C G U U U G G G U U C A C A A A C G • Dot plot – make a grid with the RNA sequence down the rows and across the columns. • Put a dot at the location of each CG or AU pair.
c g u u u g g g u u c a c a a a C G C G U U U G G G U U C A C A A A C G • Dot plot – make a grid with the RNA sequence down the rows and across the columns. • Put a dot at the location of each CG or AU pair.
c g u u u g g g u u c a c a a a C G C + + + + + G + + + + U + + + + Dot Plots U + + + + U + + + + G + + + + G + + + + G + + + + U + + + + U + + + + C + + + + + A + + + + + C + + + + + A + + + + + A + + + + + A + + + + + CGUUUGGGUUCACAAACG ((((((------)))))) “dot-bracket notation” C + + + + + G + + + +
Nussinov algorithm • Finds the largest number of nested Watson-Crick pairs in an RNA sequence. • Similar to a dot plot, but we keep track of the cumulative number of nested Watson-Crick basepairs in each subsequence as we go.
Put ones above the diagonal where there is a CG, GC, AU, or UA pair.
Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.
Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.
Each cell we fill in tells the maximum number of Watson-Crick pairs in the subsequence down and to the left of the cell.
This subsequence only has one nested Watson-Crick basepair, either a GC or a UA, but not both, since they would cross each other.
Finally we come to a subsequence that has two nested Watson-Crick pairs. The cell with the 2 is 1 for the GC pair plus 1 from the cell down and left of it, a UA.
Thermodynamic methods Assumptions: • Only Nearest Neighbor Interactions need to be considered. • Nearest Neighbor Interactions can be summed to give total free energy. • Pseudoknots and tertiary interactions can be ignored. • Most stable structure is also the kinetically favored structure. Idea: find the secondary structure with the most favorable (lowest) energy. Zuker method (mfold): Uses Dynamic Programming to calculate structure with lowest free energy • McCaskill method (sfold): Uses Dynamic Programming to calculate the most probable structure (more theoretically rigorous) Used by these programs: Mfold, Sfold, Pfold
Nearest neighbor parameters This is more sophisticated than simply counting the number of AU, UA, CG, GC basepairs in each subsequence. You also tally up the strength of each pair and the energy of one pair stacking on another pair. Bioinformatics: sequence and genome analysis By David W. Mount
Determining parameters Heat measure absorbance at 260 nm (UV) 5’ - GCCAUCCG - 3’ 3’ - CGGUAGGC - 5’ cuvette
Reaction: Strand1 + Strand2 = Duplex Determining parameters • Equilibrium constant for each T: • [S1(T)][S2(T)] • [Duplex(T)] Keq = Free energy change: ∆G(T) = -RT ln(Keq(T))
Determining parameters 5’ - GCCAUCCG - 3’ 3’ - CGGUAGGC - 5’ ∆∆G 5’ - GCCAACCG - 3’ 3’ - CGGUUGGC - 5’ Repeat this for many related sequences and do statistical analysis to get pairwise parameters. ... ∆∆G - the energy change due to substituting one basepair for another.
Parameters for most of the “loop” regions are unknown: There are too many possible loops to do experiments for all of them. Usually, unpaired regions are penalized, but it’s known that certain “loops” are very thermodynamically stable, and they are scored with low free energies (e.g. UNCG hairpin). Nearest neighbor parameters Hard to extrapolate - small change in sequence - large change in free energy.
Using thermodynamic parameters ∆G° = -RT ln(Keq) = -2.4-2.2-0.9-0.9-2.1-2.4+5.4 = = -4.5 kcal/mol Dynamic Programming
Things to keep in mind Calculated free energies are always approximate Most stable calculated structure is not necessarily most stable real structure Must consider “sub-optimal” calculated structures Must use additional information, if available, to pick correct structure
MFOLD One of the best thermodynamic methods. Developed by Michael Zuker. Web server: http://mfold.bioinfo.rpi.edu/cgi-bin/rna-form1.cgi Submit a sequence, forget the various parameters you can set, look through the output. Look for: dot plots, multiple possible structures, and minimum free energies for the structures. Look at output in png format unless you prefer another image format. M. ZukerMfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res.31 (13), 3406-15, (2003)