1.04k likes | 1.4k Views
U nweaving regulatory networks: Automated extraction from literature and statistical analysis. Overview of the talk. Introduction: project participants & jigsaw puzzle analogy Project motivation. Duality of signal transduction language. How the whole system works. Good and ugly graphs.
E N D
Unweaving regulatory networks: Automated extraction from literature and statistical analysis
Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • -------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks
Our project is set up as a collaboration of three departments of Columbia University
Interdisciplinary Collaboration: Department of Medical Informatics, Columbia University (Carol Friedman, Pauline Kra, Michael Krauthammer, Yu Hong, Andrey Rzhetsky) Department of Computer Science, Columbia University (Vasileios Hatzivassiloglou, Pablo Ariel Duboue, Wubin Weng) Columbia Genome Center, Columbia University (Pavel Morozov, Tomohiro Koike, Shawn Gomez, Sabina Kaplan, Sergey Kalachikov, Jim Russo, Andrey Rzhetsky)
Studying living organisms is not unlikeplaying with a jigsaw puzzle…
Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • -------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks
Our long-term objective:develop computational tools for automated compilation and analysis of complex cell regulation cascades in vertebrates
Problem/Motivation: Currently a search through the PubMed system with the keywords “cell cycle” and “apoptosis” produced lists of 169,293 and 29,961 articles, respectively. Clearly it is not feasible to scan all these papers “manually” ...
We decided (i) to develop tools for automatic retrieval of binary regulatory relationships between molecules from research literature using techniquesof natural language processing, and (ii) to use extracted knowledgefor editing, visualization, and superimposing/comparing homologous networks.
We call the systemGENIES (GENomics Information Extraction System)
Application of techniques of Artificial Intelligence: Natural Language Processing Goal: to identify binary relationships of the form “protein A activates protein B” “protein B inactivates gene C”
Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • --------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks
The language of regulatory pathways have significant differences with the language of metabolic pathways
Representation We represent a pathway a series of overlapping “links” – substance/action/substance triplets Substance A Substance B Substance C Substance D
We realized that the current research literature in molecular biology Describes pathways on two different levels: Logical and Biochemical
A activates BA inactivates BA phoshorylates BA methylates B... logical biochemical
Dualism: in the biochemical representation substance A is not a participant of the action, while it is in thelogicalrepresentation Logical Biochemical
Both logical and biochemical descriptions can be combined in the same sentence: Activated raf-1 phosphorylates and activates mek-1. biochemical logical
The paper descibing a “knowledge model” (=ontology) will appear in Bioinformatics
We represent a pathway a series of overlapping “links” – substance/action/substance triplets Substance A Substance B Substance C Substance D
“Actions” are relatively scanty:one can provide an exhaustive list of them
Each action comes with a mechanism (biochemical representation) and result (logical representation)
Gene and protein names are numerous (currently >80,000) and the number is growing
MedLEE (by Carol Friedman and colleagues) contains implementation of various grammatical patterns associated with the same verb:A activates B…A is an activator of B…A appeared to activate B…A is activating B…
MedLEE=Medical Language Extraction and Encoding System It is an integral part of Clinical Information Service at Columbia-Presbyterian Medical Center, It routinely processes thousands of patient records a day. MedLEE does semantic analysis of the complete sentence. If it a complete sentence cannot be parsed successfully, MedLEE does re-analysis, trying to extract parts.
For details see, e.g., Friedman, C., G. Hripcsak, W. DuMouchel, S.B. Johnson, and P.D. Clayton. 1995. Natural language processing in an operational clinical system. Natural Language Engineering. 1 (1): 83-108.
Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • --------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks
To give you a feeling of the work of the complete conveyer line…
NLP module (term markup + MedLEE) produces • [action, inactivate, [protein,rap1], • [action, activate, [complex,T-cell receptor] • [action, transcribe, [gene, gene encoding interleukin-2]]], • [parsemode, mode1]]
Which is then converted into“shorthand” notation Substance (gene) Action Action on action
Which is then further converted to a format readable by our pathway visualization program LogicalAction{ { UpstreamActionAgent { Protein{ Name{ "IL-3", } } }, DownstreamActionAgent { Complex{ Name{ "IL-3R" } } }, Result{ activation } } } Complex{ Name{ "IL-3R" } Composition{ Protein{ Name{ “IL-3R alpha” } } Protein{ Name{ “IL-3R beta” } } } } Protein{ Name{ "IL-3", } }
Example of an actual human regulatory network visualized
Overview of the talk • Introduction: project participants & jigsaw puzzle analogy • Project motivation. • Duality of signal transduction language. • How the whole system works. • Good and ugly graphs. • --------------------------------------------- • Scale-free networks in biology and outside; mechanism of stochastic birth of scale-free networks
Drawing a complex graph is a separate problem of Computer Science. We are using a Simulated Annealing Technique to find an optimum graph layout
What is a good pathway graph? • Every gene/protein name is easy to read every • Easy to trace connections between pairs of molecules • Easy to read mechanism and result for each action • Compact • Shows tissue/stage/species/cell line specificity • Beautiful