230 likes | 342 Views
CGAP/CMAP Pathway Database. Carl Schaefer February 26, 2003. Why Spend Effort on Pathways?. Target as process vs. target as molecule In the end, what matters is a hyperactive process (e.g. mitosis), not just an over-expressed protein Phenotype classification
E N D
CGAP/CMAPPathway Database Carl Schaefer February 26, 2003
Why Spend Effort on Pathways? • Target as process vs. target as molecule • In the end, what matters is a hyperactive process (e.g. mitosis), not just an over-expressed protein • Phenotype classification • Higher-level feature than transcript abundance
Why Spend Effort on a Pathway Database? • A picture may be worth a thousand words ... • but a computable representation is even better • Make assumptions explicit • Combine sources of data • KEGG, BioCarta, ... • Merge data from separate pathways • E.g. BioCarta’s “Cyclins and Cell Cycle Regulation” and “Cyclin E Destruction Pathway” • Causal framework for quantitative simulation/analysis • ... when the data becomes available
Basics • Model a causal network • Be composable (novel pathways) • Cope with lack of knowledge • Promote understanding
Model A Causal Network • Graph (nodes & edges) • Distinguish two kinds of nodes (molecules & processes) • Allow labels on nodes and edges • molecule-type (compound, protein, complex, rna) • molecule-id (...) • process-type (reaction, binding, modification, translocation, transcription, cell process) • edge-type (input, output, agent, inhibitor) • activity-state (active, inactive) • location (extracellular, transmembrane, cytoplasm, nucleus) • reversible (yes, no)
Composable • “Atomic pathway” • a process node • immediately adjacent molecules • the connecting edges • Join atomic pathways on identical molecules • ... and maybe on molecule subtype relation
Digression on Identifying Molecules • p16 and p53 are clearly different, but ... • How about NP_000068 and NP_478103 (variants of p16)? • How about AKT inactive and AKT active? • How about C5, C5a and C5b? • How about p53 in cytoplasm and p53 in nucleus? • What if you know ... • there exist two different things, but • you don’t know which one participates in the interaction
Identifying Molecules: Uneasy Compromise • Can distinguish molecules by • basic molecule-id • instance-specific labels (location, activity-state, ...) [like states] • Same molecule-ids but different instance-specific labels: • location • modifications like phosphorlyation • Different molecule-ids: • splice variants • modifications like C5 C5a, C5b • molecule-id families and unspecified label values allow for deliberate ambiguity
Identifying Molecules: Complexes • Two complexes have the same molecule-id only if their components are identical (in molecule-id and other labels) • makes the computation for joins easier, but ... • obscures relationships • ksr:mek:erk completely distinct from ksr:mek+:erk+ • Unresolved: showing relations within a complex • Within tnf:tnfr:fad, tnf binds to tnfr
Lack of Knowledge • Hierarchy of label values • e.g., edge-type incoming-edge agent • Hierarchy of molecule ids • GO id • Gene product • Specific protein • Families of molecules • “Handbook” • E.g.: “for Raf-1, ‘active-1’ means phosphorylation at S259”
Promote Understanding • Hide unwanted detail • prune common molecules • encapsulate sub-pathways • Query by connectedness (cause & effect) • Find patterns
Query by Connectedness:Predecessors/Successors atom-id = 411 direction = forward degree = 3 prune common compounds
Patterns • Templates for atomic pathways: process-type=modification:: molecule-type=protein[1]:edge-type=agent:: molecule-type=protein[2]:edge-type=input:activity-state=inactive:: molecule-type=protein[2]:edge-type=output:activity-state=active • Maybe multi-process templates (e.g., a cascade)
What Do We Need? • Computation model of pathway interactions • Persistent data model • Tools: • data input • query and analysis • visualization • Data, data, data, ...
What Do We Have? • Computation model: mostly worked out • Persistent data model: mostly worked out • Tools: • working on data input • have a query/analysis tool • joins, prunes, finds predecessors/successors • produces graph output • extracts first-order patterns • using GraphViz to produce SVG diagrams • Data, data, data ... • Loaded KEGG into database • Next: ~30 BioCarta pathways related to apoptosis, cell-cycle regulation and histone deacetylase activity
( reaction ( atom-id "411" ) ( reversible "yes" ) ( agent ( edge-seq-id "1" ) ( protein ( molecule-id "8423" ) ( LL "2194" ) ( EC "2.3.1.85" ) ( AS “FASN" ) ) ) ( input ( edge-seq-id "2" ) ( compound ( molecule-id "4872" ) ( KG "C05746" ) ( AS "3-oxohexanoyl-[acp]" ) ) ) ( output ( edge-seq-id "4" ) ( compound ( molecule-id "4873" ) ( KG "C05747" ) ( AS "d-3-hydroxyhexanoyl-[acp]" ) ) ) ) ( reaction ( atom-id "412" ) ( reversible "yes" ) ( agent ( edge-seq-id "1" ) ( protein ( molecule-id "8423" ) ( LL "2194" ) ( EC "2.3.1.85" ) ( AS "FASN" ) ) ) ( input ( edge-seq-id "2" ) ( compound ( molecule-id "4873" ) ( KG "C05747" ) ( AS "d-3-hydroxyhexanoyl-[acp]" ) ) ) ( output ( edge-seq-id "3" ) ( compound ( molecule-id "4874" ) ( KG "C05748" ) ( AS "trans-hex-2-enoyl-[acp]" ) ) ) )
digraph G { 1 [shape="box", height="0.2", width="0.2", fontsize="10", style="filled", color="black", label=""]; 2 -> 1 [color="green" ]; 2 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="EC:2.3.1.85"]; 3 -> 1 [color="black" ]; 3 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="3-oxohexanoyl-[acyl-carrier protein]"]; 1 -> 4 [color="black" ]; 4 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="d-3-hydroxyhexanoyl-[acyl-carrier protein]"]; 5 [shape="box", height="0.2", width="0.2", fontsize="10", style="filled", color="black", label=""]; 2 -> 5 [color="green" ]; 4 -> 5 [color="black" ]; 5 -> 6 [color="black" ]; 6 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="trans-hex-2-enoyl-[acp]"]; }