190 likes | 542 Views
RNA Secondary Structure Prediction. Dynamic Programming Approaches Sarah Aerni. http://www.tbi.univie.ac.at/. RNA folding Dynamic programming for RNA secondary structure prediction. Outline. RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C
E N D
RNA Secondary Structure Prediction Dynamic Programming Approaches Sarah Aerni http://www.tbi.univie.ac.at/
RNA folding Dynamic programming for RNA secondary structure prediction Outline
RNA Basics • RNA bases A,C,G,U • Canonical Base Pairs • A-U • G-C • Bases can only pair with one other base. Image: http://www.bioalgorithms.info/
RNA Secondary Structure Pseudoknot Stem Interior Loop Single-Stranded Bulge Loop Junction (Multiloop) Hairpin loop Image– Wuchty
Circle Plot • Linear RNA strand folded back on itself to create secondary structure • Circularized representation uses this requirement • Arcs represent base pairing Images – David Mount • All loops must have at least 3 bases in them • Equivalent to having 3 base pairs between all arcs Exception: Location where the beginning and end of RNA come together in circularized representation
Trouble with Pseudoknots • Pseudoknots cause a breakdown in the Dynamic Programming Algorithm. • In order to form a pseudoknot, checks must be made to ensure base is not already paired – this breaks down the recurrence relations Images – David Mount
Base Pair Maximization A C C A Problem: Find the RNA structure with the maximum (weighted) number of nested pairings G C C G G C A U A U U A U A C A G A C A C A G U A A G C U C G C U G U G A C U G C U G A G C U G G A G G C G A G C G A U G C A U C A A U U G A ACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCG
Base Pair Maximization – Dynamic Programming Algorithm S(i,j) is the folding of the subsequence of the RNA strand from index i to index j which results in the highest number of base pairs Simple Example: Maximizing Base Pairing Unmatched at i Bifurcation Umatched at j Base pair at i and j Images – Sean Eddy
Base Pair Maximization – Dynamic Programming Algorithm S(i, j – 1) • Alignment Method • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension S(i + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar to unmatched alignment Bases can pair, similar to matched alignment Dynamic Programming – possible paths S(i + 1, j – 1) +1 Images – Sean Eddy
Base Pair Maximization – Dynamic Programming Algorithm • Alignment Method • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Reminder: For all k S(i,k) + S(k + 1, j) Reminder: For all k S(i,k) + S(k + 1, j) k = 0 : Bifurcation max in this case S(i,k) + S(k + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar Bases can pair, similar to matched alignment Dynamic Programming – possible paths Bifurcation – add values for all k Images – Sean Eddy
Base Pair Maximization - Drawbacks • Base pair maximization will not necessarily lead to the most stable structure • May create structure with many interior loops or hairpins which are energetically unfavorable • Not biologically reasonable
References • How Do RNA Folding Algorithms Work?. S.R. Eddy. Nature Biotechnology, 22:1457-1458, 2004.