280 likes | 490 Views
Improving miRNA Target Genes Prediction. Rikky Wenang Purbojati. miRNA. MicroRNA (miRNA) is a class of RNA which is believed to play important roles in gene regulation. It’s a short (21- to 23-nt) RNAs that bind to the 3′ untranslated regions (3′ UTRs) of target genes. miRNA Functions.
E N D
Improving miRNA Target Genes Prediction RikkyWenangPurbojati
miRNA • MicroRNA (miRNA) is a class of RNA which is believed to play important roles in gene regulation. • It’s a short (21- to 23-nt) RNAs that bind to the 3′ untranslated regions (3′ UTRs) of target genes.
miRNA Functions • miRNA plays a major role in RNA Induced Silencing Complex (RISC). • miRNAs control the expression of large numbers of genes by: • mRNA degradation • Translational repression • Recent studies indicates it plays a role in cancer development: • Surplus of miRNA might inhibit cell apoptosis process • Deficit of miRNA might cause excess of certain oncogenes
RNA Induced Silencing Complex • mRNA degradation • Breaks the structural integrity of a mRNA. • Translational repression • Prevent the mRNA from being translated.
Characteristics of miRNA • Short (22-25nts) • Transcripted from a miRNA gene • Intragenic: miRNA gene is located inside a host gene (usually intron region) • Intergenic: miRNA gene is located outside gene bodies • A consistent 5’ and 3’ boundary: • Transcription Start Site • 5’ Cap • Poly(A) tail
miRNA General Research Question • Much attention has been directed in miRNA processing and targeting. • Computational-wise, one basic challenge of miRNA: Given a miRNA sequence, what are its target genes?
miRNA sequence target prediction • Predict target genes by matching the complement of miRNA sequence. • Two types of complement: • Perfect complement • Imperfect complement Find perfect match for seed (2-8nt)
miRNA sequence target prediction • Several requirements for matching: • Strong Watson-Crick base pairing of the 5’ seed (2-8 nts) • Conservation of the miRNA binding site across species • Another approach: thermodynamic rule • Local miRNA-mRNA interaction with positive balance of minimum free energy
Problems and Opportunities • Problem: Pure computational target genes prediction produces a lot of candidates • No unifying theory for target gene prediction yet • Most of them are not validated yet • Common assumption is that most of them are false positives • Can we shorten the list to include only the strong candidates ?
Problems and Opportunities • Opportunity: Lots of publicly available experimental dataset i.e. cDNA microarray, miRNA microarray, etc. • Use the dataset to computationally validate some of the target genes • Current Research: Preliminary research tries to utilizes the abundance of publicly available microarray data.
Assumptions • miRNA works by silencing target genes, thus miRNA gene and target genes should be anti-correlated • Intragenic miRNA are expressed along with the host gene. • a host gene should be anti-correlated with a target gene • Intergenic miRNA does not have a host gene, but we might be able to use available composite (miRNA microarray + cDNA microarray) dataset • If a miRNA is up-regulated in miRNA microarray, then its target genes should be down-regulated in cDNA microarray
Current Work • There have been some works related to this idea (i.e. HOCTAR) • However, we can improve it by: • Using a stricter criteria across the microarray data • Using a more diverse data • We expect we will get a much better specifity than the previous method
Hoctar Method • Get a list of target genes from 3 different tools (pictar, TargetScan,miranda) • Uses Pearson correlation to determine the correlation coefficient between 2 genes • Include target genes which have correlation below some threshold (-) • Only works for intragenic miRNA
Shortcomings of Hoctar • Uses all probes data even though they are not consistent • Uses only one target gene prediction algorithm approach • Depends on Pearson Correlation, which is sensitive to outliers
Improvement Idea (1) • Use only subset of data which probes are all consistent • Treat each probes as different experiments
Improvement Idea (2) • Pearson correlation is very sensitive to outliers, alternative solutions: • Uses Rank correlation coefficients instead of Pearson correlation coefficients • Normalize the dataset to normal distribution • Ignore outliers
Improvement Idea (3) • In addition to probes consistency and rank correlation, we might use entropy rule in eliminating candidate target genes • Assumption: • Transcript level can be approximated from expression level data • One miRNA transcript can only degrade one mRNA transcript • Thus miRNA expression changes should not be much different from mRNA expression changes
Improvement Idea (4) • Uses a larger amount of microarray data • We might be able to include miRNA microarray to further refine target genes list for several miRNA
Preliminary Result • GSE9234 dataset (hipoxia/normoxia) • Using only consistency criteria
Refining Intergenic miRNA prediction • Refining intergenic miRNA prediction using microarray dataset is not a trivial task • Microarray can only be used to measure the expression of target genes, but not the miRNA gene • Might have to rely on additional data: • Proxy measurement • miRNA microarray
Intergenic miRNA proxy measurement • Putative target gene approximation • use the expression level of a known target genes for that specific intergenic miRNA • If its target genes are consistently down-regulated, then we can assume that the expression level of the intergenic miRNA gene is up-regulated • Cluster miRNA approximation • Some intergenic miRNAs are clustered with each other; according to (Saini et al. 2007) most of these clusters use the same pri-mirNAtranscript • Use method 1 for neighboring miRNA to get the intergenic miRNA expression approximation
Further Work • Implementation and evaluation • Standardizing composite dataset repository