1 / 15

Using RNA secondary structure to guide sequence motif finding toward single-stranded regions

Using RNA secondary structure to guide sequence motif finding toward single-stranded regions. Michael Hiller, Rainer Pudimat, Anke Busch and Rolf Backofen Rachel Brower-Sinning. Motivation:.

kostya
Download Presentation

Using RNA secondary structure to guide sequence motif finding toward single-stranded regions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using RNA secondary structure to guide sequence motif finding toward single-stranded regions Michael Hiller, Rainer Pudimat, Anke Busch and Rolf Backofen Rachel Brower-Sinning

  2. Motivation: RNA binding proteins are an integral part of pre-mRNA processing (splicing, etc.) and regulate mRNA processes (transport, stability, etc) On the mRNA molecule, there might be multiple binding sites according to the sequence specificity, but the structure of the strand will determine which site is accessible to the BP Current programs, such as MEME, search for motifs using a PSPM to look at the probability of each letter at each position in the pattern, but do not look at the secondary structure of the sequence

  3. Approach The approach taken by MEMERIS to to first pre-compute the “single-strandedness” of a substring in the RNA sequence from position a to b; allowing for the choice of two measurements- The first being the probability that all bases in the string are unpaired (PUab) And the second being the expected fraction of bases in the substring [a,b] that do not form base pairs (EFab)

  4. Approach con’t Next, this secondary structure information is integrated in with MEME MEME is a program for finding motifs in a set of unaligned sequences (X = X1, X2, … ,Xn), where the motif is defined as a PSPM, Θ1 = (P1, P2, … ,PW) where W is the length of the motif and the vector Pi is the probability distribution of the letters at position i. A given sequence Xi is modeled as consisting of two different parts: - a non-negative number of non-overlapping motif occurrences sampled from the matrix - random samples from a background probability distribution for the remaining sequence positions

  5. Approach con’t MEME considers three different models- - exactly one motif occurrence per sequence (OOPS model) - zero or one motif occurrences per sequence (ZOOPS model) - zero or more motif occurrences per sequence (TCM model) To find the motif(s) an expectation maximization algorithm is used to perform a maximum likelihood estimation of the model given the data

  6. Approach con’t OOPs model -MEME uses where -which changes to in MEMERIS, where

  7. Approach con’t The expectation of the hidden variables is computed as And the ML estimation remains unchanged

  8. Effects

  9. Results • PU values are stricter than the EF values; using PU values will favour single strandedness more than using EF values • Trade off: PU values are dependent on motif length: an increase in the length of the motif results in a decrease in PU While EF values are independent of motif length

  10. Results con’t

  11. Results con’t Using the SELEX data, which contains 33 TCAT or ACAT repeats in hairpin loops, binding sites of the neuron-specific splicing factor Nova-1 MEMERIS correctly identifies these described 33 TCAT and ACAT his MEME identifies the correct motifs, but also motifs outside the hairpins (not binding sites)

  12. Results con’t (testing on SELEX data)

  13. Results con’t(PIE Rfam data)

  14. Results con’t(TAR Rfam)

  15. Discussion RNA binding proteins bind in a sequence specific manner, but have a preferred structural characteristic to the binding site (with the motif occurring in dsRNA having been found to eliminate protein binding) MEMERIS can both look for sequence motif and incorporate secondary structure information MEMERIS has been shown to identify ssRNA motifs that often are the protein binding motif

More Related