570 likes | 695 Views
Systems Biology for Drug Discovery. Building and using protein interaction networks: industry perspective. Andrej Bugrim GeneGo, Inc. Topics. Annotation process and collecting network content for idustrial-type applications
E N D
Systems Biology for Drug Discovery Building and using protein interaction networks: industry perspective Andrej Bugrim GeneGo, Inc.
Topics • Annotation process and collecting network content for idustrial-type applications • Biological and disease ontologies – how to improve and use them in functional analysis • Tools: utilizing network data in pharmaceutical R&D
Causative relations Mechanistic relations Multi-level understanding of human biology Level of phenotype Level of Cell process/ network Level of protein
Disease group Network group Specialty group Chemistry group Causative disease associations: DNA, RNA, protein levels Protein-protein; Protein-DNA; protein-RNA interactions Biomarkers Ligand-receptor interactions: drugs, leads, hits Compare Causative BC models BC-perturbed cell processes Other cancers chosen by Consortium Disease-centered knowledge base in MetaMiner (Oncology example) GG annotation team General BC schema
Three interactions domains in MetaCore • 1,600 drugs w/targets • 4,100 endogenous metabolites • >21,000 ligand-receptor interactions • 850 GPCRs and other membrane receptors • 110 Nuclear hormone receptors Ligands: metabolites, peptides, xenoboitics Membrane receptors Signal transduction: G proteins, Secondary messengers Kinases Phosphotases 172K manually curated physical signaling interactions 538 canonical maps 42,000 13-step canonical signal transduction pathways 924 Human transcription factors 6,000 target genes Transcription factors 11,300 metabolic reactions 116 Fine metabolic maps Core effect: metabolic pathways Metabolites 4,100 endogenous metabolites
MetaBase Content Overview • Database • Chemical compounds 580,000 • Drugs 8,590 • Chemical Reactions 35,600 • Metabolic networks 251 • Network • Proteins + genes 13,402 • Transcription factors 924 • Chemical compounds 26,000 • Drugs 2,740 • Endogenous compounds 4,100 • Proteins linked to drugs 2,711 • Reactions 5,330 • Small molecule ligands forhuman receptors 3,510 • blockers for ion channels 629 • Pubmed journals 3,100 • Pubmed articles 81,400 • Total amount of interactions 177,000 • Content • GeneGo regulatory networks 120 • GeneGo disease networks 88 • Maps 538 • Regulatory maps325 • Metabolic maps 116 • Traditional metabolic maps (EC)97 • Diseases4,920
Database Chemical compounds 580,000 Metabolic Human Genes reactions proteins (human: 35,600 14,570 38,700) Total:137,500 MetaBase content by type
Manually curated interactions (172,787) Logical relations; 1,934; 1% Signalling interactions; 137,297; 79% Protein-protein; 87,675; 51% Small molecule-protein; 42,383; 26% Metabolic reactions; 35,490; 21% Y2H "Interactome"; 2,370; 1% With virus protiens; 335; 0% Chip-Chip; 980; 1% With MicroRNA; 1,620; 1% Network interactions All interactions taken from articles indexed in Pubmed Pubmed journals 3,100 Pubmed articles 81,400
Network objects Total number of nodes: 40,229
Endogenous compounds (4,100 total) • 3,070 endogenous compounds involved in metabolic reactions: 6,819 reactions with endogenous compounds only • 751 endogenous ligand for 498 receptors with 2,455 interactions • 4000 (98%) of endogenous compounds in network • 15,962 network interactions with endogenous metabolites • 3,600 compounds with structures and brutto-formulas (other 700 are “generic”: contain acyl-, alkyl- and other variable groups)
Enzyme2 Enzyme1 reaction1 metabolite reaction2 Network and pathway statistics in GeneGO • >40,000 nodes; • ~177,000 edges; • Average node degree: 3,77; • 241 million shortest pathways; • Average shortest pathway length: 5.3811; • 42,000 13-step canonical signal transduction pathways; • 200canonical metabolic pathways- major metabolic fluxes like glycolysis or TCA; • 72,000 pathways on metabolic maps: pathways analogous to KEGG (KEGG has 42,500)
Pathways in regulatory network Start: TMR (transmembrane receptor) TF (Transcription Factor) a a b End: Target genes
By genre: • Drama • Action • Romance • Horror • Foreign • By director: • Lynch • Tarantino • Leone • Stone • Antonioni • By actor: • Pitt • Nicholson • Depp • Redford • Damon • By year: • 2007 • 2006 • 2005 • 2004 • 2003 Molecular pthwy Cellular process Disease Metabolic process Mixed ontologies Knowledge base (ontologies) • How do you compare “action” movies vs. Tarantino movies vs. 2003 movies? • These are incomparable as these are different categories
Multiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human biology
Enrichment in GO and GeneGo processes GO processes GeneGo process networks • Resolution: interactions between proteins • Connections between all proteins in folder • Clear signaling path, effect within process • Resolution: list of proteins • No connections between proteins • No sgnaling/effect within process • 4 samples from 4 patiens • Disease/norm from same patients • Affy U133A arrays
Inflammation Genes from GO process “Inflammatory response” 231 Genes fromGO-processes “Inflammatory response” “Immune response” 613 Genes fromGO-process “Immune response” 446 Not in networks 79 Not in networks 199 In networks 152 In networks 247 Not in networks 268 In networks 345 Genes in 15 process networks 1642 Genes added to networks 1297
Diseases 4,881 Diseases, based on MeSH 38,709 Human genes total Human genes linked to diseases – 6,318 Diseases linked to genes – 1,630 Human genes not linked to diseases – 32,391 Diseases with no gene links – 3,251 6,318 genes are linked to 1,630 diseases 21,264 uniquearticles, indexed in PubMed
Drug toxicity tree 38 Drug-induced pathological processes Folders from MeSH Folders created at GeneGo based on reviews
OMIM • Only genetic info (mutation, SNPs) • No expression • No protein activity, loc Gene-Disease connections in public domain and GeneGo GENE MeSH Only citation with Diseases name. Low trust Only hierarchical structure disease tree Public domaindoes not have structuredinformation about disease connectivity(by clinical classification) and causative relations withgenes and proteins GeneGo • Hierarchical strusture • disease classification 4,888 diseases • Genes associated with diseases 6,429 • Cited articles 33, 792
Content. Cancer maps and networks. Breast Cancer:general scheme
Unique genes Fine metabolic differences between rodents, human Human Mouse, Rat Unique genes and orthologs catalyse one reaction 141 mouse genes 74 rat genes There is no human orthologs for Protein A Unique genes catalyze unique reactions 9 mouse genes 2 rat genes Orthologs catalyse different reactions 1 mouse gene 1 rat gene
Data analysis workflow in MetaDiscovery suit • Custom interactions data: • Y2H • Pull-down • Co-expression • annotation Custom maps, networks, pathways Molecular bio data ISIS DB MetaLink PathwayEditor MapEditor • P-value scoring • Ontologies: • GO processes • GeneGo processes • Canonical pathways • Metabolic networks • Diseases • Toxicities • Cross-experiment comparison • Time series • Multi-patient cohorts • Multiple logical operations • Complete report • Signature networks • Diseases • Drug response • Network alignment • Multiple algorithms • Sub-network queries SBML, BioPax • Modeling software: • CellDesigner • Virtual Cell • Med. chemistry: • Indications • - Toxicities • - Off-site effects Structures sdf, MOL Metabolites HTS, HCS HTS, HCS MetaCore/MetaDrug platform Biology: - Biomarkers - Pathway-based targets
MetaCore™ Platform Networks Building Tools Statistics for pathways, processes, networks Pathway editor Visualization Tools Data:m-arrays, SAGE, proteomics, siRNA, metabolites, custom interactions Logical operations module curated interactions from the literature Oracle Based Database
Networks of protein interactions • Dynamic; built “on-the-fly” • Exploratory tool • Build new pathways for genes of interest Pathways Integration • Interactive, static maps • 550 maps • Signaling, regulation, metabolism, diseases • Backbone of formalized “state of art” in the field
Choose direction and checkpoints within network building page From – histamine through – histamine H1 receptor to – Actin
False discovery rate filter i Threshold 0.01 Apply Non-significant bars become semi-transparent
New customization modules • MapEditor: custom maps synchronized with MC/MD database • Draw pathways maps from scratch • Transform gene lists into networks into pathway maps • Edit MetaCore’s canonical maps • View and score your maps within the context of canonical maps • Map experimental data on custom maps • MetaLink: overlaying custom interactions • Import custom interactions (Y2H, co-expression, pull-down, etc.) • Visualize using GeneGo network building algorithms • Score “unknown” proteins (high IP potential) based on relevance to “benchmark” networks built from MetaCore interactions • PathwayEditor: annotation technology transfer, at the database level • Custom annotation of interactions, compounds, diseases, metabolism in the framework of internal annotation system at GeneGo • Use the annotation forms, workflows and QC system developed at GeneGo • Novel objects are imported and integrated with pre-existing data in MetaCore
Adding Localizations Additional Localizations can be added
Your NEW map is now an interactive part of MetaCore Users can visualize their experimental data on the new map
Mapping interaction sets on networks Resulting Direct Interactions network Pink interactions are from the uploaded links file Mouse over an interaction to see the uploaded weight value Blue interactions are in both the links file and the MetaCore database
Old and new ways to analyze data Current way of analysis: all significance calculations done before mapping onto network Statistical procedures, thresholds of fold, p-value either in MC or 3rd party tools Full data tables Connect them on network by one way or another: Too many choices, no clear way to choose Sets of genes New way of analysis: significance calculations follow the mapping onto network Statistical procedures in MC based on concurrent analysis of expression profiles and connectivity Full data tables Apply to global network Sets of network modules
Network signatures for compounds effects Mestranol Phenobarbital Tamoxifen Phenobarbital
A Topologically significant Not topologically significant Finding topologically significant nodes B C 4 out 6 under nodes regulated by B are differentially expressed: more than random share = significant Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event = not significant In reality algorithm also considers nodes beyond first-degree neighbors Differentially expressed genes Non-differentially expressed genes
Why JAK1 is significant in this dataset? Regulation via JAK1 Feedback loops • JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1 • Topological significance helps to find important links in pathways that do not come up on HT screens
Regulation of lipid Metabolism Topologically significant nodes revealed by the new algorithm Differentially expressed genes identified by microarray and confirmed by proteomic screen
Putting it all together: network activity inference • Identifying causal relation between putative input and output signals • Tracking effects of molecular perturbation trough activation/inhibition cascades Predicted input Scoring intermediary nodes Experimental data Experimental data: terminate cascade Predicted target Experimental data: start cascade Inferred activity
Work in progress • Finding Patterns of significance (based on one experiment): • Significant neighborhoods • Significant receptors (by underlying cascade) • Significant transcription factors (by upstream cascade) • Significant interaction types (by distribution of expression at terminals) • Finding common and different pathway modules (based on multiple samples: • Looking for “differential pathways” - modules that distinguish one group of samples from another • Finding common motifs in a group of pathway modules • Inferring patterns of network activity • Identifying causal relation between putative input and output signals • Tracking effects of molecular perturbation trough activation/inhibition cascades • Looking into mutual gene-process information and Bayesian inference of significance • If gene G occurs only in process P its up-/down-regulation is a significant evidence with respect to inferring P’s status • If gene G occurs in many other processes in addition to P its up-/down-regulation is not a significant evidence with respect of inferring P’s status
MetaMiner Consortiums for 2007 • Oncology (breast cancer, 4 other cancers) • Metabolic diseases (diabetes II, obesity, metabolic syndrome) • CNS and neurodegenerative diseases • Immunological and autoimmune diseases