200 likes | 355 Views
Improving Intergenic miRNA Target Genes Prediction. Rikky Wenang Purbojati. miRNA. MicroRNA (miRNA) is a class of RNA which is believed to play important roles in gene regulation. It’s a short (21- to 23-nt) RNAs that bind to the 3′ untranslated regions (3′ UTRs) of target genes.
E N D
Improving Intergenic miRNA Target Genes Prediction RikkyWenangPurbojati
miRNA • MicroRNA (miRNA) is a class of RNA which is believed to play important roles in gene regulation. • It’s a short (21- to 23-nt) RNAs that bind to the 3′ untranslated regions (3′ UTRs) of target genes.
miRNA Characteristics • Short (22-25nts) • miRNA plays a major role in RNA Induced Silencing Complex (RISC). • miRNAs control the expression of large numbers of genes by: • mRNA degradation • Translational repression • Expression of miRNA will reduce the expression of its target genes • Intergenic miRNA gene is located outside gene bodies
Basic miRNA problem • Finding miRNA true target genes is not a trivial task • One approach is to make a computational prediction before validating it in wet-lab experiments • one basic challenge of miRNA: Given a miRNA sequence, what is its target genes?
miRNA sequence target prediction • Several requirements for matching: • Strong Watson-Crick base pairing of the 5’ seed (2-8 nts) • Conservation of the miRNA binding site across species • Local miRNA-mRNA interaction with positive balance of minimum free energy • Available tools for target genes prediction: PicTar, TargetScan, miRanda,microT, etc. • Most tool’s prediction does not complement each other, because they use different criteria
Problem and Opportunity • Problem: Pure computational target genes prediction produces a lot of candidates • Most of them are not validated • Common assumption is that most of them are false positives • Can we shorten the list to include only the strong candidates ? • Opportunity: Lots of publicly available experimental dataset i.e. cDNA microarray, miRNA microarray, etc. • Use the dataset to computationally invalidate some of the target genes
Assumptions • miRNA works by silencing target genes, thus miRNA gene and target genes should be anti-correlated • Intragenic miRNA are expressed along with the host gene. • a host gene should be anti-correlated with a target gene • Intergenic miRNA does not have a host gene, but its real target genes should be correlated together • The real target genes should be down-expressed whenever the intergenic miRNA is expressed.
How to invalidate a target gene prediction • A target gene prediction can be invalidated by using a set of microarray datasets • For Intragenic miRNA target gene: • If a target gene’s expressions has no correlation with the host gene’s expression, we assume that the target gene does not influenced by the host gene • For Intergenic miRNA target gene: • If a target gene behaves inconsistently compared to other target genes, we assume that it might not be affected by the miRNA gene
Filtering Intergenic miRNA Target Gene Prediction • Use a combination of 8 prediction tools to produce the initial predictions (union & intersection) • Use a collection of 190 microarray datasets to invalidate some of the predictions • Use a greedy method to approximate the final subset of high-confidence target genes
Consistent Target Genes • We need to establish the meaning of consistent target genes • In this context, target gene A and target gene B is consistent if: • For all microarray datasets in which gene A is down-regulated, then gene B is also down-regulated
Greedy Method • Given a set of target gene predictions, and a collection of microarray dataset: • We wanted to find: • The longest subset of consistent target genes • The highest number of down-regulated target genes in the subset
Reasoning • Why we wanted to find: • The longest subset of consistent target genes? • Consistent target genes, on large number of microarray dataset with different experiments, might indicate that they are affected by a common factor, which may be microRNA • The longest subset ensures high probability of including the true target genes • The highest number of down-regulated target genes in the subset? • Since miRNA works by down-regulating target genes, it is desirable to find the largest subset of consistently down-regulated target genes
Current Algorithm for i = 0 to K A <- G[i] SigA <- signature(A) Temp_Subset = {SigA} down = countDownExpressedMicroarray(A) for j = 0 to K B <- G[j] SigB <= signature(B) if SigA== SigB Temp_Subset U {SigB} end if end for if (length(Temp_subset) > length(Subset)) && (down > downexpr_cnt) subset = Temp_Subset downexpr_cnt = down end if end for
Algorithm Limitations • The algorithm result might be biased based on the first pivot gene expression signature : • Might get stuck on local maxima • Can be solved by prioritizing, sorting of target gene down-expression value, or random selection of pivot gene • The subset is an approximation of high-confidence target genes, but it doesn’t necessarily include all real target genes (because of supporting data limitation)
Benchmarking • Compare the performance with other prediction tools, based on: • Number of correct predictions (based on validated target genes) • Number of predictions • The algorithm will use an initial target predictions with: • 2, 3, and 4 prediction tools support
Conclusion • In general, the approximation method shows better sensitivity compared to other prediction tools • Specificity can be improved by including only target gene that is supported by more than 2 prediction tools
Further Work • Adjusting the scoring function to find the optimum balance between the length of the subset and the number of down-regulated target genes • Implementing a threshold on target gene signaturing to further reduce the specificity