320 likes | 451 Views
Construction of Molecular Networks and Pathways using OMICs and Literature Data. Mathew Palakal and Meeta Pradhan School of Informatics IUPUI. From Bibliomics to Target Discovery for Colorectal Cancer. CRC related Keywords. BioSIFTER Literature harvesting and Personalization. BioMAP
E N D
Construction of Molecular Networks and Pathways using OMICs and Literature Data Mathew Palakal and Meeta Pradhan School of Informatics IUPUI
From Bibliomics to Target Discovery for Colorectal Cancer CRC related Keywords BioSIFTER Literature harvesting and Personalization BioMAP Mining and Identification of novel biomarkers
BioMAP: BioMedical Literature Mining • “ A major challenge faced by biologist is to identify the most • significant genes in a disease that can be targeted” Nodes/Links Experimental Data Our Hypothesis: Augmenting the experimental data with literature data can help to identify novel molecules that may be of significant relevance to the study under consideration. New Nodes/Links Augmented with Literature Data
Regulatory Network Construction and Analysis Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling CRC miRNA Network Multi-scale Multi-level Analysis Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data Hyper geometric Associations P53 EP300 Sub-Graph Analysis • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning CRC TF Network
Experiments on TF Networks Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Set of 48 keywords: myh, mlh1, cdk8, crcs7, dcc, crcs6, tgfbr1, tpx2, crcs, apc, hnpcc7, msh2, mlh1, braf, hnpcc, msh6, pten, fus1, cxcl2, rad18, hgf, axin2, casp3, prl3, nat1, gstm1, gstt1, cyp2c9, bcl2, prmt1, sn38, cpt11, proxy, smad3, igfbp1, pdgfb, capg, plk1, ifim1, csnk2a2, mbl2, pms2, cxcl2, igfir, cyp27b1, cyp24, mucins, colorectal Cancer Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Literature Mining Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis • Retrieved 133,923 articles. • Obtained 2724 unique Swiss-Prot entry names. Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Protein Interaction Prediction Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS • Protein-protein interaction prediction is based on: • Gene Ontology Annotation Similarity Association • Structural Interaction • Pfamdomain interaction • Sequence Potential Analysis 2724 Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Sliding Window Algorithm for PPI Prediction Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling • Physico-chemical parameters for probable • interacting interface identification • Hydrophobicity • Accessibility • Residue Interface Propensity • P53 : EP300= Total Interacting Score (Number of Interface Residue and Number of Structure Interacting) • Protein % structure Interacting % structure Interacting • P53_HUMAN 70 MDM2_HUMAN 93 • P53_HUMAN 59 EP300_HUMAN 100 • P53_HUMAN 67 MDM4_HUMAN 100 • UBP7_HUMAN 100 P53_HUMAN 74 P53 miRNA Network 2F1Y A 1c26 A 1Z1M A Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 1L3E B 3BIY A Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis EP300 TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Transcription Factor Network Generation for CRC Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network • 117 transcription factors • 277 non-transcription factors • 700 interactions Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Multi-level Multi-parametric Approach to Identify Significant Transcription Factors in CRC Network Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling • Topological Analysis • Nodestrength= function (ProteinInteractionPropensityScore, • Topological Features) • Sub-Graph Analysis • Hyper geometric Associations • Multiparametric approach is used to identify significant Transcription Factors. miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Results: Significant Transcription Factors in CRC Network Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling • Highly Scored Common Transcription factors: • c-Jun, NF-kB, P53, STAT3, SP1, STAT1, c-MYC, E2F1, SMAD3, MEF2A • Highly Scored Unique Transcription Factors: • Topological: LEF1, MEF2C, SMAD2, • SMAD4, ELK-1, PPARA • Hypergeometric: DAND5, RXRA, ESR1, • ATF-2, SP3, RARA, PPARD • Module: P73, ETS1, ETS2, GATA-1, • FOXA1, FOXA2, SLUG, • HAND1, SNAIL, VDR, TF7L2, • ITF-2, REST, SRF, IRF1 miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Result: A Highly-scored Module Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network PIAS1 Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS C-JUN Topological Analysis ATF-2 ESR1 MAPK14 Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 MAPK1 JNK1 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis ELK-1 TF Network MK09 MK10 Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Global Transcription Factor Association Network showing Functional Groups Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Annotation of miRNA with Transcription Factors in CRC Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling • Expression dataset: GSE14985 • 3 Normal samples, 3 colon samples • No. of miRNA :723 • Top 100 differentially expressed miRNA are identified. • 26 upregulated and 74 downregulatedmiRNA are further analyzed. miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Novel miRNA identified Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling Up-regulated Novel miRNATarget of miRNARelevance to cancer hsa-miR-663 CCND1, FOS, PTEN, TGFBR1 Not reported* hsa-miR-630 ATM, BAX,BCL2,BCL2L2, CASP3, Not reported* p53, TP73 hsa-miR-424 ATF2, BCR, CCND1,CDK6, CHEK1, Kidney, E2F1, EGFR, ESR1, ETS1, FLT3, Pancreatic cancer HIF1A, MUC1, MYB, RARA, RUNX1, SMAD3, SP2,WEE1 * The target genes were identified by literature mining and many genes are important in CRC miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Novel miRNAIdentified Down-regulated Novel miRNATarget of miRNA Disease hsa-let-7c BBC3, BCL2, MCL1, MEF2C, MYC, Lung,hepatocellular NGF, PPARA, ADAM9 cancer hsa-let-7d BDNF, CCND1, EGFR, SMAD3 Epithelial Ovariancancer hsa-let-7i BCL2, HIF1A, NFKB1, TLR4 Breast cancer hsa-miR-103 BMP7, CDK6, PPARA Pancreatic cancer hsa-miR-100 AKT1,CCND1, ESR1,FGFR3,JUN,P53 Oral squamous cell MYC carcinoma hsa-miR-99a AKT1, BDNF, CCND1, JUN,IGF1, JUN, Bladder cancer MYC, p53 hsa-miR-30e Bcl2l2, ERBB2 Lung cancer hsa-miR-425 SMAD3 Glioblastoma hsa-miR-361-5p AKT1, IRS1 Ovarian cancer hsa-miR-494 AKT1, CDK6, JUN, PTEN Cardiac Hypertrophy hsa-miR-331-3p AKT1, EGFR, ERBB2 Epithelial ovarian cancer Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
miRNA-gene Network Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Number of miRNAAssociated with CRC Related Pathways Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Validation of the Significant Genes Experiment Data hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling miRNA Network Identification of significant nodes in the network Module: Brca1: p53:c-Myc Pathway: Brca1 as a transcription regulator Domain: DNA Damage Literature augmented data SMAD4, P53, NF-kB, AKT1, PAK1, SOS Topological Analysis Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data P53 EP300 Hyper geometric Associations • Protein Interaction Prediction • Gene Ontology Annotation Similarity Association • Structural Interactions • Pfam Domain Interactions • Sequence Potential Analysis TF Network Validation of the Significant Genes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Protein-Protein Interaction Prediction Tool hMLH1: DNA repair MSH2: DNA repair CDK8: Wnt signaling Identification of significant nodes in the network Experiment Data Topological Analysis SMAD4, P53, NF-kB, AKT1, PAK1, SOS Literature augmented data Sub-Graph Analysis Annotating the Interaction Network with miRNA and miRNA Expression Data Hyper geometric Associations EP300 P53 Algorithm for Interacting Proteins Validation of the Significant Nodes Interaction Scoring (i) First Principle Methods (ii) Machine Learning
Publications • M. Pradhan, P. Gandra, M. Palakal, Predicting Protein-Protein Interactions using First Principle Methods and Statistical Scoring, ACM International Symposium on Biocomputing, Calicut, 2010. • M. Pradhan and M. Palakal, Global analysis of transcription factors and functional domains in CRC. (Manuscript under preparation). • M. Pradhan, P. Gandra, M. Palakal, Predicting Protein-Protein Interactions using First Principle Methods and Statistical Scoring, ACM International Symposium on Biocomputing, Calicut, 2010. • M. Pradhan and M. Palakal, Identifying CRC specific pathways and biomarkers from literature augmented proteomics data, BIOCOMP 2010. • M. Pradhan and M. Palakal Global analysis of miRNA target genes in colon rectal cancer, IEEE BIBM Hong Kong, 2010. • M. Pradhan and M. Palakal, Global analysis of transcription factors in CRC using protein interaction networks. (Manuscript in final stages). • M. Pradhan and M. Palakal, Identifying candidate pathways and genes in CRC: meta-analysis of gene expression data (Manuscript in preparation). • M. Pradhan and M. Palakal, Machine Learning for Predicting Protein Interactions (Manuscript in preparation). • M. Pradhan, Sanders P and M. Palakal, Algorithm for Protein-drug binding predictions (Manuscript in preparation). • Y. Pandit , M. Pradhan and M. Palakal, Database for Protein-Protein Interaction Predictions (Manuscript in preparation).
Acknowledgements • The TiMAP team: Meeta Pradhan Shielly Hartanto Premchand Gandra Deepali Jhamb Rini Pauly Gokul Kilaru Philip Sanders Yogesh Pandit Sijin C. A. Tulip Nadu • Kshithija Nagulapalli http://regen.informatics.iupui.edu/research/