1 / 19

MicroRNA identification based on sequence and structure alignment

MicroRNA identification based on sequence and structure alignment. Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li. Presented by - Neeta Jain. Outline. Introduction Motivation Experiment Materials Methods Results Conclusion. Introduction.

connor
Download Presentation

MicroRNA identification based on sequence and structure alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MicroRNA identification based on sequence andstructure alignment Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li Presented by - Neeta Jain

  2. Outline • Introduction • Motivation • Experiment • Materials • Methods • Results • Conclusion

  3. Introduction • What are miRNAs and why are they important? • miRNAs are ~22 nt long non-coding RNAs • They are derived from their ~70 nt precursors, which typically have a hairpin structure Importance of miRNAs: • They are found to regulate the expression of target genes via complementary base pair interactions.

  4. Motivation • Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close homologues • It has been reported that miRNA genes are more conserved in their secondary structure than in primary structure • This paper exploits this secondary structure conservation and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignment • The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as BLAST and ERPIN

  5. Experiment • Materials • Reference sets • Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs. • 1054 animal miRNAs and their precursors(1104) composed our raw training set Train_All. • Train_Sub_1 : All animal miRNAs except those from C.briggsae • Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegans • Genomic sequences • Sequences of 6 species were used.

  6. Methods • Preprocessing • Known precursors from training set are used to BLAST against the genome • Potential regions are cut from the genome with 70 nt flanking sequences to each end • Such regions are scanned using a 100nt window with 10 nt step • Overlapping sequences with repeat sequences are discarded.

  7. Methods (contd) • miRAlign • Secondary Structure Prediction • Both the candidate sequence and it’s reverse complement are analyzed by RNA fold to predict hairpins. • Only hairpins with MFE lower than -20 kcal/mol are retained. • Pairwise sequence alignment • Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set • Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. • If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are kept for further analysis

  8. Methods (contd) • Checking miRNA’s position on stemloop • 3 properties for miRNA’s position are considered: • Should not locate on terminal loop of hairpin • Should locate on the same arm of hairpin • Position of potential miRNA on hairpin should not differ too much from it’s known homologues Position difference of miRNA on precursors A and B:

  9. Methods (contd) • RNA secondary structure alignment • RNAforester computes pairwise structure alignment and gives similarity score • Score is a summation of all base (base pair) match (insertion, deletion). • Normalized similarity score of structure C and m is given as: where, C – Candidate sequence ; m – known pre-miRNA; sigma_local(C,m) – raw local alignment score between C and m Sigma(m,m) – self-alignment score of m

  10. Methods (contd) • Total similarity score After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each candidate sequence. Where, C- candidate sequence ; R – set composed of all C’s

  11. Methods (contd) Summary -

  12. Results • Application on C.briggsae • Detection of miRNA homologues - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded. • Identification of miRNAs in distantly related species - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded

  13. Results (contd) Graph 1 -

  14. Results (contd) Graph 2 -

  15. Results (contd) Comparison of miRAlign with BLAST -

  16. Results (contd) Comparison of miRAlign with ERPIN -

  17. Results (contd) Other results: • miRAlign was applied to A. gambiae and 59 putative miRNAs with tss > 35 were detected . This was validated when 38 A. gambiae miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by miRAlign • miRAlign was also applied to plant, Zea mays and detected 28 out of 40 known Zea Mays miRNAs.

  18. Conclusion • Combining sequence and structure alignments, miRAlign has better performance than previously reported homologue search methods • Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is underway.

  19. THANK YOU Questions ??

More Related