200 likes | 306 Views
Improving Free Energy Functions for RNA Folding. RNA Secondary Structure Prediction. Why RNA is Important. Machinery of protein construction Catalytic role in cells May be possible to destroy specific sequences of RNA (to interrupt protein production) RNase P (Cech/Altman c.1981).
E N D
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction
Why RNA is Important • Machinery of protein construction • Catalytic role in cells • May be possible to destroy specific sequences of RNA (to interrupt protein production) • RNase P (Cech/Altman c.1981)
AAUCG...CUUCUUCCA Primary Tertiary Secondary RNA Structural Levels Secondary: http://anx12.bio.uci.edu/~hudel/bs99a/lecture21/lecture2_2.html Tertiary: http://www.leeds.ac.uk/bmb/courses/teachers/trnballs.html
Abstracting the problem A G C G C A U C Zuker (1981) Nucleic Acids Research 9(1) 133-149
Why it is hard • Large search space (hard to enumerate) Hofacker et al. (1994) Monat. Chem. 125 167-188
Why it is hard • Secondary structure does not exist. • Unlike proteins • Putative structures (prone to revision) • Quality of Energy Functions • Discussed later
Current Algorithms • Single-Strand • Minimum Free Energy (Zuker et. al. 1981) • Partition Functions (McCaskill 1990) • Comparative Sequence Analysis • Max. Weighted Matching (Nussinov et. al. 1978) • Stochastic CFG (Sakikibara et. al. 1994) • Phylogenetic Trees (Gulko et. al. 1995) • Statistical Significance (Noller & Woese, early 80’s) See proposal for references
MFE / Tinoco Hypothesis The free energy of a secondary structure equals the sum of the free energies of the loops and stacked pairs Tinoco et al. (1971) Nature 230 362-367.
Secondary Structures Proposed System AAUCG...CUUCUUCCA 2 GA (E’) 3 1 MFE (E) AAUCG...CUUCUUCCA
Step I - Calc MFE Structure • Given a sequence apply the MFE algorithm • Generates secondary structure S
Step II - Structural Similarity • Given a database of experimentally verified RNA structures • Let Q be the database structure most similar to S • Based on RNase P Database (Brown 1999)
Step III - Construct E’ • Create a new energy function:
Discussion on E’ • E’ has global information • Global information precludes the use of dynamic programming (MFE, Partition) • Leaves (stochastic) combinatorial optimization • Gradient Descent (no E/S) • Genetic Algorithms / Simulated Annealing
Step IV - Genetic Algorithm • RNA Structural Prediction by GA • Input: sequence • Output: structure that maximizes E’ for • Steady State Genetic Algorithm • Pseudoknots forbidden (conflicts) • Fitness = -E’ • Effect of Similarity(Q, S) diminishes with each generation (pseudo-SA).
23 52 (23 52 3 3.2) length start end weight Genetic Algorithm - Repn. • Stem-loop representation(Chen et. Al. 2000) • Window method (EMBOSS Palindrome)
Fit stems of P2 into C1 or C2 randomly. Placement must be conflict free. C1 P1 P2 C2 Genetic Algorithm - Operators • Mutation • Add stem from stem pool to a child • Crossover
Preliminary Results • E’ does not lead to drastic speed up • Genetic algorithm is very slow • If initial population generated randomly from stem pool. • Use suboptimal folding for initial population.
Preliminary Results Explained • The real structure is usually very similar the Tinoco optimal structure. • View E’ as a way of choosing among the suboptimal structures.
Future Work • More testing on the entire RNase P Database (> 400 structures) • Tune E’ • Accuracy comparison to MFE and Partition Function Algorithms • Parallelize genetic algorithm