1 / 35

In silico study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

In silico study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs. Ka-Lok Ng ( 吳家樂 ) Department of Biomedical Informatics ( 生物與醫學資訊學系 ) Asia University. Contents. Motivation Predict cancer genes based on microarray mRNA expression levels

derron
Download Presentation

In silico study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In silico study of cancer-related genes and microRNAs運用微晶片篩選癌症基因及探討其上游之調控microRNAs Ka-Lok Ng (吳家樂) Department of Biomedical Informatics (生物與醫學資訊學系) Asia University

  2. Contents Motivation • Predict cancer genes based on microarray mRNA expression levels • microRNA (miRNA) can act as an oncogene (OCG) or tumor suppressor gene (TSG) • Identify cancer-related miRNAs, their target genes, downstream protein-protein interactions (prediction novel cancerous proteins) (1) Introduction – microarray, cancer, microRNA (2) Methods – input data (3) Results (a) cancer genes prediction (Bioconductor), i.e. prostate/breast cancer (b) correlation study of miRNAs and mRNA expression levels (c) ncRNAppi – A platform for studying microRNA and their target genes’ protein-protein interactions (4) Summary

  3. Central dogma of molecular biology Post-transcription regulation – microRNA targets mRNA transcriptome

  4. Introduction Types of RNAs

  5. 癌症的形成及97年台灣前十大主要癌症死亡原因摘要癌症的形成及97年台灣前十大主要癌症死亡原因摘要

  6. Probe genes Target cDNA labeled by Cy5 (Red) cDNA labeled by Cy3 (Green) By Hanne Jarmer, BioCentrum-DTU, Technical University of Denmark Microarray – overview

  7. cDNA microarrays Microarrays are used to measure gene expression levels in two different conditions. Greenlabel for the control sample and a red one for the experimental sample. DNA-cDNA or DNA-mRNA hybridization.The hybridised microarray is excited by a laser and scanned at the appropriate wavelenghts for the red and green dyesAmount of fluorescence emitted (intensity) upon laser excitation ~ amount of mRNA bound to each spotIf the sample in control/experimental condition is in abundance  green/red, which indicates the relative amount of transcript for the mRNA (EST) in the samples. If both are equal  yellowIf neither are present  black

  8. Microarray data generation, processing and analysis Image analysis Information processing • Image quantitation – locating the spots and measuring their fluorescence intensities • Data normalization and integration – construction of the gene expression matrix from sets of spot • Gene expression data analysis and mining – finding differentially expressed genes (DEGs) or clusters of similarly expressed genes • Generation from these analyses of new hypotheses about the underlying biological processes stimulates new hypotheses that in turn should be tested in follow-up experiments Data analysis clustering http://www.mathworks.com/company/pressroom/image_library/biotech.html

  9. Introduction – biogenesis of microRNA miRNA gene  pri-miRNA (stem-loop structure) processed by Drosha  pre-miRNA (65~90 bp) carried by Exportin 5 to cytoplasm  mature miRNA (20~25 bp) is generated by the RNaseIII type enzyme Dicer  directed by RISC to the miRNA target  mRNA cleavage or impede its translation into protein

  10. When miRNA plays an oncogenic role, it targets TSG, control cell differentiation or apoptosis genes, and leads to tumor formation. • if miRNA plays the tumor suppressor role, it targets OCG, control cell differentiation or apoptosis genes, so it can suppress tumor formation. • Expect negative correlation of miRNA and mRNA expression profiles • integrate the human miRNA-targeted (or siRNA-targetd) mRNA data, protein-protein interactions (PPI) records, tissues, pathways, and disease information to establish a disease-related miRNA (or siRNA) pathway database Introduction - miRNAs can play the role of an OCG and TSG

  11. Introduction – cancer-related miRNAs

  12. A platform for studying miRNAs and cancerous target genes Annotation: miR2Disease – disease related miRNA Chromosomal fragile sites miRNA clusters info. CpG island proximal miRNA TarBASE data  Experimentally verified miRNA-mRNA pairs miRNA miRNA-mRNA anti-correlation pairs NCI-60 cancer data: Expression profile of miRNA and mRNA Annotation: TAG  known OCG, TSG or CRG OMIM  disease genes KEGG  cancer pathways mRNA Number of cell lines for the nine cancer types in the NCI-60 data sets

  13. miRNA, target gene, protein-protein interaction (PPI) protein miRNA or siRNA protein (TF) protein (mRNA is suppressed) protein TG L1 L2 BP/MF x y z Overlap BP/MF n1 n2 • Tissue specific miRNA or siRNA target, and its PPI partners up to the second level • If the upstream miRNA (or siRNA) is defective, its effect could be amplified downstream. • As an illustration, given that a miRNA (or siRNA) targets gene TG, which has two successive PPI partners, i.e. proteins L1and L2; and suppose that genes TG and L2 are involved with the same disease, then it is highly probably that gene L1 is also related to the same disease  quantify by enrichment analysis

  14. Input data and Methods Databases : • ArrayExpress • 64 prostate cancer tissue and 18 normal prostate tissue samples’ raw data files with U95Av2 • TAG (Tumor Associated Gene) • NCI-60 – miRNA and mRNA gene expression profiles for 9 cancer types • TarBase – miRNA targets (experimental verified) • miR2Disease • a comprehensive resource of miRNA deregulation in various human diseases • OMIM – human disease information • KEGG – cancer pathways information • ncRNAppi • a useful tool for identifying ncRNA target pathways • PPI data (BioIR) – Seven databases are integrated: HPRD, DIP, BIND, IntAct, MIPS, MINT and BioGRID • Gene Ontology (GO) – Biological Function, Molecular Process annotations • Tool: Bioconductor

  15. ResearchProtocol

  16. Predict DEGs using R and Bioconductor commands

  17. Results – DEGs predicted by Bioconductor • The result of the top 100 DEGs (either up or down) • Eliminate duplicated genes, the predicted total number of DEGs is 85, and the adjusted p-value of all DEGs are less than 1.9 * 10-5. • TAG ∩ DEGs 14 known cancer genes among the 85 predicted DEGs (16.5%)

  18. Results – miRNAs, DEGs and cancer types Other DEGs

  19. Results - The relationship among miR-20a, TGFBR2 and human prostate cancer 16461460 http://ppi.bioinfo.asia.edu.tw/R_cancer/

  20. A platform for studying miRNAs and cancerous target genes

  21. A platform for studying miRNAs and cancerous target genes Annotation: miR2Disease – disease related miRNA Chromosomal fragile sites miRNA clusters info. CpG island proximal miRNA TarBASE data  Experimentally verified miRNA-mRNA pairs miRNA miRNA-mRNA anti-correlation pairs NCI-60 cancer data: Expression profile of miRNA and mRNA Annotation: TAG  known OCG, TSG or CRG OMIM  disease genes KEGG  cancer pathways mRNA Number of cell lines for the nine cancer types in the NCI-60 data sets

  22. A platform for studying miRNAs and cancerous target genes For a given cancer tissue type, we calculated both the PCC and SRC, r, between the is given by, where xi and yi denote the expression intensity of miRNA and the miRNA's target gene respectively. One of the troubles with quantifying the strength of correlation by PCC is that it is susceptible to be skewed by outliers. Outliers that are a single data point can result in two genes appearing to be correlated, even when all the other data points not. SRC is a non-parametric statistical method that is robust to outliers. The PCC and SRC are calculated for: Three Affymetrix chips: U95(A-E), U133A, U133B Normalization methods: GCRMA, MAS5, RMA

  23. Test of hypothesis of PCC and SRC The Pearson product-moment table to test the significance of a PCC result. The hypothesis being tested is a one-tailed test. A different test is applied for the SRC results. Critical values for one-tailed test using Pearson and Spearman correlation at a significant level of a equal to 0.05 and 0.10.

  24. Results – hsa-miR-1:AXL, PCC and SRC calculations Cases where both PCC and SRC are less than or equal to -0.5.

  25. Results – hsa-miR-10b:HOXD10 Another example: hsa-miR-21:PTEN (TSG) hsa-miR-15b: BCL2 (TSG) hsa-miR-16: BCL2 (TSG) miR2Disease - hsa-mir-10b initiated diseases, i.e. leukemia, breast, colon, ovarian, prostate cancers.

  26. Extension - works in progress • Validate how good is correlation prediction • Adding further information • – CpG island, miRNAs located around CpG islands (i.e., miR-34b, miR-137, miR-193a, and miR-203) are silenced by DNA hypermethylation in oral cancer • miRNA clusters, fragile sites • Positive correlated miRNA:mRNA pairs may involving TFs

  27. ncRNAppi – miRNA, target genes, PPI, andthe protocol of enrichment analysis protein miRNA or siRNA protein (mRNA is suppressed) protein (TF) protein There is a tendency for two directly interacting proteins participate in the same biological process or share the same molecular function. Let a miRNA targeting pathway denoted by miRNA – TG – L1 – L2. We propose to rank the pathway result according to the number of overlapping of the biological processes (or molecular functions) between TG and L1, and between L1 and L2. The Jaccard coefficient (JC) is used to rank the significance of a pathway. JC of set A and B is defined by where and denote the cardinality of and respectively. JC(TG,L1) JC(L1,L2)

  28. ncRNAppi – The protocol of enrichment analysis The biological process (BP) and molecular function (MF) annotations are carried from Gene Ontology, which is used to characterize the path TG – L1 – L2, and the JC for the pathway is given by, where and denote the JC score of the biological process for segment TG – L1, and the TG – L1– L2 pathway respectively.

  29. ncRNAppi – The protocol of enrichment analysis, p-value We assigned a p-value to every JC calculation, this provides a measure of the statistical significance. Here is how we estimate the p-value. Let N be total number of BP found in GO. Assume that TG,L1 and L2 have x, y and z BP annotations respectively. Also, let n1 and n2 be the number of identical BP for TG – L1 and L1 – L2 respectively. Let p1 and p2 be the probabilities that TG – L1 and L1 – L2 have n1 and n2 common BP (or MF) terms respectively, which are defined as; and TG L1 x-n1 n1 y-n1 N

  30. ncRNAppi – Extension of TarBase targets Limitations of miRNA target prediction tools There are many tools available for miRNA target genes prediction, such as miRanda, TargetScan, and RNAhybrid etc. A major problem of miRNA target genes prediction is that the prediction accuracy remains uncertain, there was report indicated that the false positive rate could be as high as 24-39% for miRanda, and 22-31% for TargetScan. If the miRNA:mRNA targeting part is uncertain, then the ‘Level 1’ and ‘Level 2’ protein-protein interaction pathways derived from PPI database are doubtful.

  31. ncRNAppi – Extension of TarBase targets • miRNA target prediction tool – miRanda • Mature human miRNA FASTA sequences is downloaded from miRBase • (the latest version is 13). • Then, we predict the possibilities of miRNA binding with OCG, and TSG. • Target prediction tool, miRanda, allows for fining tuning of certain parameters, i.e. MFE threshold, score, shuffle statistics, gap open and gap extension scores. • We set MFE threshold and the shuffle statistics to -25 kcal/mol and ON respectively. • The rest of the parameters are set to their default values. • Once the binding lists of OCG and TSG obtained, then their PPI pathways can be retrieved from the BioIR database.

  32. Results - ncRNAppi • ncRNAppi provides web-based data access and allows disease assignment for a specific node along miRNA (siRNA) targeting pathways. For example • Select miRNA ID – hsa-let-7 • Checks the ‘OMIM Disease type for individual node’ box labeled with ‘Target’ and ‘Level-2’ • Choose the item ‘lung tumor’ under the ‘TUMOR TYPE’ pull-down menu (OMIM) • Select ‘Yes’ under the “Common expression of target, Level-1 and level-2 nodes in KEGG” • pathways are ranked according to the Jaccrad index and p-value for BP or MF Example hsa-let7 Unigene: liver Target, L1 and L2 are OCG submit

  33. Summary The R and Bioconductor are used to predict DEGs using prostate cancer microarray data. By integrating the Tumor Associated Gene (TAG), ncRNAppi and miR2Disease databases, it is found that certain DEGs are regulated by microRNAs. A platform for studying miRNAs and cancer target genes (1) PCC and SRC results are used to quantify the correlation between miRNA and its target expression profiles. The predicted results are annotated with reference to the TAG, OMIM, miR2Disease and KEGG data sets. (2) The main advantage of the two platforms on miRNA-mRNA targeting information is that all the target genes information and disease records are experimentally verified. ncRNAppi platform ncRNAppi provide a powerful tool for identifying cancer-related miRNAs or siRNAs. For instance, the tool allows the possibilities of predicting novel caner genes through tissue or disease specific search. This platform is useful for investigating the regulatory role of miRNAs and siRNAs for cancer study.

  34. Acknowledgement National Science Foundation Professor S.C. Lee (李尚熾) - Chung Shan Medical University Mr. Liu Hsueh-Chuan (劉學銓) – former graduate student at Asia University Mr. C.W. Weng (翁嘉偉)– former graduate student at Asia University Mr. Kevin Lo (羅琮傑)– MSc. graduate student at Asia University

  35. Thank you for your attention.

More Related