190 likes | 302 Views
MicroRNA identification based on sequence and structure alignment. Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li. Presented by - Neeta Jain. Outline. Introduction Motivation Experiment Materials Methods Results Conclusion. Introduction.
E N D
MicroRNA identification based on sequence andstructure alignment Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li Presented by - Neeta Jain
Outline • Introduction • Motivation • Experiment • Materials • Methods • Results • Conclusion
Introduction • What are miRNAs and why are they important? • miRNAs are ~22 nt long non-coding RNAs • They are derived from their ~70 nt precursors, which typically have a hairpin structure Importance of miRNAs: • They are found to regulate the expression of target genes via complementary base pair interactions.
Motivation • Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close homologues • It has been reported that miRNA genes are more conserved in their secondary structure than in primary structure • This paper exploits this secondary structure conservation and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignment • The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as BLAST and ERPIN
Experiment • Materials • Reference sets • Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs. • 1054 animal miRNAs and their precursors(1104) composed our raw training set Train_All. • Train_Sub_1 : All animal miRNAs except those from C.briggsae • Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegans • Genomic sequences • Sequences of 6 species were used.
Methods • Preprocessing • Known precursors from training set are used to BLAST against the genome • Potential regions are cut from the genome with 70 nt flanking sequences to each end • Such regions are scanned using a 100nt window with 10 nt step • Overlapping sequences with repeat sequences are discarded.
Methods (contd) • miRAlign • Secondary Structure Prediction • Both the candidate sequence and it’s reverse complement are analyzed by RNA fold to predict hairpins. • Only hairpins with MFE lower than -20 kcal/mol are retained. • Pairwise sequence alignment • Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set • Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. • If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are kept for further analysis
Methods (contd) • Checking miRNA’s position on stemloop • 3 properties for miRNA’s position are considered: • Should not locate on terminal loop of hairpin • Should locate on the same arm of hairpin • Position of potential miRNA on hairpin should not differ too much from it’s known homologues Position difference of miRNA on precursors A and B:
Methods (contd) • RNA secondary structure alignment • RNAforester computes pairwise structure alignment and gives similarity score • Score is a summation of all base (base pair) match (insertion, deletion). • Normalized similarity score of structure C and m is given as: where, C – Candidate sequence ; m – known pre-miRNA; sigma_local(C,m) – raw local alignment score between C and m Sigma(m,m) – self-alignment score of m
Methods (contd) • Total similarity score After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each candidate sequence. Where, C- candidate sequence ; R – set composed of all C’s
Methods (contd) Summary -
Results • Application on C.briggsae • Detection of miRNA homologues - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded. • Identification of miRNAs in distantly related species - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded
Results (contd) Graph 1 -
Results (contd) Graph 2 -
Results (contd) Comparison of miRAlign with BLAST -
Results (contd) Comparison of miRAlign with ERPIN -
Results (contd) Other results: • miRAlign was applied to A. gambiae and 59 putative miRNAs with tss > 35 were detected . This was validated when 38 A. gambiae miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by miRAlign • miRAlign was also applied to plant, Zea mays and detected 28 out of 40 known Zea Mays miRNAs.
Conclusion • Combining sequence and structure alignments, miRAlign has better performance than previously reported homologue search methods • Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is underway.
THANK YOU Questions ??