360 likes | 548 Views
Roles of RNA mRNA (messenger) rRNA (ribosomal) tRNA (transfer) other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P) viral genomes artificial ribozymes. Typical transfer RNA structure. Bulges. Internal loops. Hairpin loop. Multi-branched
E N D
Roles of RNA • mRNA (messenger) • rRNA (ribosomal) • tRNA (transfer) • other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P) • viral genomes • artificial ribozymes
Bulges Internal loops Hairpin loop Multi-branched loop G C }DG = -2.1 kcal/mol U A }DG = -1.2 kcal/mol U A loopDG = + 4.5 kcal/mol C U C U Thermodynamics parameters are measured on real molecules. Helix formation = hydrogen bonds + stacking Entropic penalty for loop formation. Sum up contributions of helices and loops over the whole structure.
(b) (a) (c) k l k l l k j i j i i j Pairs i-j and k-l are compatible if (a) i < j < k < l , or (b) i < k < l < j . (c) is called a pseudoknot: i < k < j < l . Usually not counted as secondary structure. Bracket notation is used to represent structure: a: ((((....))))..((((....)))) b: ((.((((....)))).)) Basic problem: Want an algorithm that considers every allowed secondary structure for a given sequence and finds the lowest energy state.
i k j Simplest case: find structure which maximizes number of base pairs. Let = -1 if bases can pair and + if not. Ignore loop contributions. E(i,j) = energy of min energy structure for chain segment from i to j. We want E(1,N). = or i j i j-1 j Algorithms that work by recursion relations like this are called dynamic programming. The algorithm is O(N3) although the number of structures increases exponentially with N. Also need to do backtracking to work out the minimum energy structure: Set B(i,j) = k if j is paired with k, or 0 if unpaired.
= or i j i j-1 j i k j Partition Function Algorithm (for simplest energy rules) Real Energy Rules : Need to consider many special cases. What type of loop are you closing? Algorithm is more complex but still is O(N3). where
N 1 i j Equilibrium probability that base i is paired with j Equilibrium probability that base i is unpaired Example of pairing probabilities taken from Vienna package web-site
i C A B D iii I ii B B H C D D E F G Is folding kinetics important? RNA folding kinetics involves reorganisation of secondary structure Native structures may not be global minimum free energy states. Morgan & Higgs (1996) J. Chem. Phys.
Energy Landscapes in RNA Folding Morgan & Higgs (1998) Groundstates are degenerate in this model because energies are integers. Generate many random groundstates. How far apart are these groundstates? How high are the barriers between groundstates?
We found Frozen pairs (present in every groundstate) This figure shows the frozen pairs only. The molecule is divided into independent unfrozen loops. Define Neff as the length of the longest loop. Two groundstates for the same sequence
Minimum Free Energy Prediction Deterministic. Always gets MFE structure for a given set of energy rules. If MFE structure is not the same as biological structure, this could be because (i) energy rules are inaccurate or insufficient (ii) kinetics is important and molecule is trapped in metastable state. Monte Carlo simulations of folding kinetics. Store a current structure. Estimate rates of removal of existing helices and rates of addition of other compatible helices. Choose one helix to be added or removed with probability proportional to its rate. Repeat this many times. Can simulate structure formation from an unfolded state.
Q is a bacteriophage RNA virus with approx 4000 nucleotides Viral RNA has complex secondary structure. The replicase gene codes for the replicase protein. This is an RNA-dependent RNA polymerase. Synthesizes complementary strand. Viral replication needs two steps: plus to minus to plus.
In vitro RNA evolution in the Q system c c c c sequence RNA after many transfers Begin with Replicase + nucleotides + viral RNA Replicase + nucleotides only Transfer small quantity to each successive tube
Barrier heights between alternative groundstates Observation: Mean barrier height between groundstates scales as <h> ~ Neff0.5 Neff ~ 0.3 N Therefore barriers become significant for large enough sequences.
An example where kinetics is important to control biological function: the 5’ region of the MS2 phage. 3500 130 Maturation protein
Time to formation of the 5’ structure influences expression of the maturation protein more than the stability of this structure. Simulations compare with experiments on mutant sequences.
RNA in comparison to Proteins Both have well defined 3d structures RNA folding problem is easier because secondary structure separates from tertiary structure more easily - But it is still a complex problem. RNA model has real parameters therefore you can say something about real molecules. RNA folding algorithm is simple enough to be able to do statistical physics. (cf. 27-mer lattice protein models).
Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA Full gene is length ~950 11 Primate species with mouse as outgroup
Murphy et al. Nature (2001) uses 15 nuclear plus 3 mitochondrial proteins
Afrotheria / Laurasiatheria Striking examples of convergent evolution
Cao et al. (2000) Gene uses 12 mitochondrial proteins
RNA pairs model (GR7) 53 complete Mammalian mitochondrial genomes Complete set of rRNAs + tRNAs from = 973 pairs. Jow et al. (2002) 100 100 86 100 97 100 100
MCMC searches the rugged landscape in tree space using the Metropolis algorithm. Obtains a set of possible trees weighted according to their likelihood. 1. Rate parameter changes = continuous 2. Branch length changes = continuous 3. Topology changes = discrete E A 2 D E C B A A E D 1 4 Nearest-neighbour interchange Long-range move E D C B C C D B 3 B A
Models of Sequence Evolution rijis the rate of substitution from state i to state j States label bases A,C,G & T i Pij(t) = probability of being in state j at time t given that ancestor was in state i at time 0. t j
The HKY model describes rate of evolution of single sites to from The frequencies of the four bases are kis the transition-transversion rate parameter * means minus the sum of elements on the row
Compensatory Substitutions Two sides of the acceptor stem from a tRNA are shown. Due to structure conservation alignment is possible in widely different species. 1234567 7654321 ((((((( ))))))) Bacillus subtilis GGCUCGGCCGAGCC Escherichia coli GCCCGGA UCCGGGC Saccharomyces cerevisiae GCGGAUUAAUUCGC Drosophila melanogaster GCCGAAA UUUCGGC Homo sapiens GCCGAAA UUUCGGC
Model 7A is a General Reversible 7-state Model 7 frequencies pi + 21 rate parameters aij - 2 constraints = 26 free parameters
Probability of remaining in same state Pii SSU rRNA sequences from Eubacteria
Probability Pij of changes from CG to other pairs SSU rRNA from Eubacteria
AU UA GU fast fast UG slow GC CG What is going on? Selection against GU and UG is weaker than against mismatches. Double transitions are faster than double transversions. Double transitions are faster than single transitions to GU and UG states. This is explained by the theory of compensatory substitutions.
Analysis of RNA sequence databases Selection for thermodynamically stable structures Higgs (2000) Quart. Rev. Biophysics
Analysis of RNA Substitution Rates Thermodynamic properties influence Evolutionary properties