250 likes | 336 Views
Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis. by Ilya Mazo, Ph.D. It’s All About Pathways. Promise of Systems Biology. Understanding: Drug specificity Chemotherapy response Biomarker panels New target mechanisms. Building Models.
E N D
Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya Mazo, Ph.D.
Promise of Systems Biology Understanding: • Drug specificity • Chemotherapy response • Biomarker panels • New target mechanisms
Building Models • Identify the elements of the system • Describe the interactions/regulations between such elements • Simplify the system by identifying components (functional modules or pathways) • Integrate/validate with experimental data
Available Pathway Information 14 mln 12 mln 10 mln 8 mln Abstract count 6 mln 4 mln 2 mln 0 1965 1968 1971 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 2004 PubMed Year
MedScan Information Extractor Reads >1000 abstracts per minute
How MedScan extracts facts from text? • Sentence in PubMed: “ Axin binds beta-catenin and inhibits GSK-3beta.” • Identify Proteins in Dictionary (in red): “ Axin binds beta-catenin and inhibits GSK-3beta.” • Identify Interaction Type (in black): “ Axinbindsbeta-catenin and inhibitsGSK-3beta.” • Extracted Facts: Axin - beta-catenin relation: Binding Axin -> GSK-3beta relation: Regulation, effect: Negative
Overview of MedScan Architecture Input Text Protein names dictionary Dictionary-based Identifies proteins and small molecules Preprocessor Tagged Sentences Lexicon Tokenizer Context-free grammar Grammar and lexicon are proprietary. They are domain-independent by design but focused on biomedical field. Sequence of Words Pattern Matcher Syntactic Parser Grammar Sentence Structure Semantic Interpreter Semantic tree Rule-based Rules are equivalentto ontology Extraction rules Ontological interpreter Converter Database of relations Extraction patterns Extracted facts
[Transcription] [factor] {7157=p53} [activates] [apoptosis] [in] [hepatocytes] PubMed – 7 mln abstracts MedScan 1,000,000 Facts ResNet Database Database of Pathways >94 % precision >70 % recovery
Relation Type Count Expression Control 99,361 Binding 50,812 Protein Modification 25,368 Mol. Synthesis 99,643 Mol. Transport 48,423 Regulation 675,539 Promoter Binding 3,661 Total: protein relations 1,002,807 Extracted Information 1,002,807 relations (3.7 mil. findings extracted from 2005 Medline and 43 FTJ)
2004 2003 2006 2005 Build Pathway (Find Neighbors)
Mechanistic Model of Disease ADCYAP1 LEPR ADRB2 LECAM-1 ADRB3 NOS3 AGT NPY APM1 NR3C1 CD38 NR3C1 FABP2 PC-1 GCGR PGC 1 GFPT PLA2G4A GYS1 PON 1 HFE PON 2 HFE PON2 HNF1a PPAR g2 HNF4a PPP1R3 ICAM1 PTPN1 INSR RAGE IRS 1 SOD2 IRS 2 TGF b KCNJ11 UCP 1 KCNJ11 UCP2 + genes harboring DAVs associated with Type 2 Diabetes Mellitus. (from Mol Cell Proteomics, Sharma et al 2005)
Building Models • Identify the elements of the system • Describe the interactions/regulations between such elements • Identify functional modules (pathways) • Integrate/validate with experimental data
Signaling Paths/Cascades Logical relations Physical relations EGFR signaling including activation of Erk2 and the ELK-1 transcription factor The MAP and ERK kinase (MEK-1) is a dual specificity kinase that phosphorylates ERK1/2 on T-E-Y. ERK can phosphorylate and activate transcription factors such as TCF/ELK-1
Inferring Cascades • Simple protein classification schema and membrane-to-nucleus signaling paradigm can be applied - Receptor - Ligand - Extracellular - Transcription factor - Nuclear receptor • - Effector. • It allows for the network • partitioning into several • hundreds of “signaling • cascades”.
200 textbook pathways 700 inferred regulomes Regulomes as Canonical Pathways 60% average overlap P<10-4
Regulomes as Logical Models • Use dependency relations to determine the “area of influence” for target proteins (receptors, kinases) 1) Both PP1 and expression of dominant negative c-Src inhibited PDGF-induced PI 3 kinase. 2) A pharmacologic inhibitor of c-Src, PP1 • Logical Models: • “what if?”
Building Models • Identify the elements of the system • Describe the interactions/regulations between such elements • Identify functional modules (pathways) • Integrate/validate with experimental data
p=1e-5 p=0.24 p=0.0004 Find significant regulators • Experimental dataset: melanoma, aggressive vs. non-aggressive cell lines, flat vs. 3D growth conditions. (Folberg and Arbieva, UIC)
Prediction of Activity Profiles • Activity as a function of expression level and the ability to induce changes in the targets • Random Markov fields formalism
Combining the Approaches • Start with the global network of interactions • Add expert knowledge • Infer subnetworks (individual pathways) • Signaling cascades and regulomes • Phenotype or disease association • Regulators and downstream targets • Advanced models • Use available data (microarrays, proteomics) to screen for relevant pathways. • Add validated pathway libraries to the software package.
Web Client Client PC PathwayStudio Tools Tools Local DB PathwayExpert Tools Tools Tomcat, Java MedScan Oracle/PostgreSQL Central DB Linux Server Integrated Systems Biology Platform