1 / 23

CGAP/CMAP Pathway Database

CGAP/CMAP Pathway Database. Carl Schaefer February 26, 2003. Why Spend Effort on Pathways?. Target as process vs. target as molecule In the end, what matters is a hyperactive process (e.g. mitosis), not just an over-expressed protein Phenotype classification

oriana
Download Presentation

CGAP/CMAP Pathway Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CGAP/CMAPPathway Database Carl Schaefer February 26, 2003

  2. Why Spend Effort on Pathways? • Target as process vs. target as molecule • In the end, what matters is a hyperactive process (e.g. mitosis), not just an over-expressed protein • Phenotype classification • Higher-level feature than transcript abundance

  3. Why Spend Effort on a Pathway Database? • A picture may be worth a thousand words ... • but a computable representation is even better • Make assumptions explicit • Combine sources of data • KEGG, BioCarta, ... • Merge data from separate pathways • E.g. BioCarta’s “Cyclins and Cell Cycle Regulation” and “Cyclin E Destruction Pathway” • Causal framework for quantitative simulation/analysis • ... when the data becomes available

  4. Basics • Model a causal network • Be composable (novel pathways) • Cope with lack of knowledge • Promote understanding

  5. Model A Causal Network • Graph (nodes & edges) • Distinguish two kinds of nodes (molecules & processes) • Allow labels on nodes and edges • molecule-type (compound, protein, complex, rna) • molecule-id (...) • process-type (reaction, binding, modification, translocation, transcription, cell process) • edge-type (input, output, agent, inhibitor) • activity-state (active, inactive) • location (extracellular, transmembrane, cytoplasm, nucleus) • reversible (yes, no)

  6. Composable • “Atomic pathway” • a process node • immediately adjacent molecules • the connecting edges • Join atomic pathways on identical molecules • ... and maybe on molecule subtype relation

  7. Pathway Construction:Joining Atomic Pathways

  8. Digression on Identifying Molecules • p16 and p53 are clearly different, but ... • How about NP_000068 and NP_478103 (variants of p16)? • How about AKT inactive and AKT active? • How about C5, C5a and C5b? • How about p53 in cytoplasm and p53 in nucleus? • What if you know ... • there exist two different things, but • you don’t know which one participates in the interaction

  9. Identifying Molecules: Uneasy Compromise • Can distinguish molecules by • basic molecule-id • instance-specific labels (location, activity-state, ...) [like states] • Same molecule-ids but different instance-specific labels: • location • modifications like phosphorlyation • Different molecule-ids: • splice variants • modifications like C5  C5a, C5b • molecule-id families and unspecified label values allow for deliberate ambiguity

  10. Identifying Molecules: Complexes • Two complexes have the same molecule-id only if their components are identical (in molecule-id and other labels) • makes the computation for joins easier, but ... • obscures relationships • ksr:mek:erk completely distinct from ksr:mek+:erk+ • Unresolved: showing relations within a complex • Within tnf:tnfr:fad, tnf binds to tnfr

  11. Complex Example

  12. Lack of Knowledge • Hierarchy of label values • e.g., edge-type  incoming-edge  agent • Hierarchy of molecule ids • GO id • Gene product • Specific protein • Families of molecules • “Handbook” • E.g.: “for Raf-1, ‘active-1’ means phosphorylation at S259”

  13. Promote Understanding • Hide unwanted detail • prune common molecules • encapsulate sub-pathways • Query by connectedness (cause & effect) • Find patterns

  14. Omission of Don’t Care Detail:Pruning Common Compounds

  15. Query by Connectedness:Predecessors/Successors atom-id = 411 direction = forward degree = 3 prune common compounds

  16. Patterns • Templates for atomic pathways: process-type=modification:: molecule-type=protein[1]:edge-type=agent:: molecule-type=protein[2]:edge-type=input:activity-state=inactive:: molecule-type=protein[2]:edge-type=output:activity-state=active • Maybe multi-process templates (e.g., a cascade)

  17. What Do We Need? • Computation model of pathway interactions • Persistent data model • Tools: • data input • query and analysis • visualization • Data, data, data, ...

  18. What Do We Have? • Computation model: mostly worked out • Persistent data model: mostly worked out • Tools: • working on data input • have a query/analysis tool • joins, prunes, finds predecessors/successors • produces graph output • extracts first-order patterns • using GraphViz to produce SVG diagrams • Data, data, data ... • Loaded KEGG into database • Next: ~30 BioCarta pathways related to apoptosis, cell-cycle regulation and histone deacetylase activity

  19. Analysis Tool Input Screen

  20. Sample Pathway (Atoms 411, 412)

  21. ( reaction ( atom-id "411" ) ( reversible "yes" ) ( agent ( edge-seq-id "1" ) ( protein ( molecule-id "8423" ) ( LL "2194" ) ( EC "2.3.1.85" ) ( AS “FASN" ) ) ) ( input ( edge-seq-id "2" ) ( compound ( molecule-id "4872" ) ( KG "C05746" ) ( AS "3-oxohexanoyl-[acp]" ) ) ) ( output ( edge-seq-id "4" ) ( compound ( molecule-id "4873" ) ( KG "C05747" ) ( AS "d-3-hydroxyhexanoyl-[acp]" ) ) ) ) ( reaction ( atom-id "412" ) ( reversible "yes" ) ( agent ( edge-seq-id "1" ) ( protein ( molecule-id "8423" ) ( LL "2194" ) ( EC "2.3.1.85" ) ( AS "FASN" ) ) ) ( input ( edge-seq-id "2" ) ( compound ( molecule-id "4873" ) ( KG "C05747" ) ( AS "d-3-hydroxyhexanoyl-[acp]" ) ) ) ( output ( edge-seq-id "3" ) ( compound ( molecule-id "4874" ) ( KG "C05748" ) ( AS "trans-hex-2-enoyl-[acp]" ) ) ) )

  22. digraph G { 1 [shape="box", height="0.2", width="0.2", fontsize="10", style="filled", color="black", label=""]; 2 -> 1 [color="green" ]; 2 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="EC:2.3.1.85"]; 3 -> 1 [color="black" ]; 3 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="3-oxohexanoyl-[acyl-carrier protein]"]; 1 -> 4 [color="black" ]; 4 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="d-3-hydroxyhexanoyl-[acyl-carrier protein]"]; 5 [shape="box", height="0.2", width="0.2", fontsize="10", style="filled", color="black", label=""]; 2 -> 5 [color="green" ]; 4 -> 5 [color="black" ]; 5 -> 6 [color="black" ]; 6 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="trans-hex-2-enoyl-[acp]"]; }

More Related