Storytelling and Clustering for Cellular Signaling Pathways

Storytelling and Clustering for Cellular Signaling Pathways M. Shahriar Hossain, Monika Akbar, Nicholas F. Polys Department of Computer Science, Virginia Tech, Blacksburg, VA 24061.

Objective • STKE Dataset • Cell interactions through chemical signals • Discover relationships between the pathways • Graph structure • Subgraph discovery problem • Pathways relationships • Clustering • Storytelling

Myocyte Adrenergic Pathway (CMP_9043)

Dataset properties

Design Pipeline Pathway Graphs STKE Dataset Preprocessor Frequent Subgraphs Frequent Subgraph Discovery Clustering NN Storytelling

Subsequent Candidate Generation l q q l p p p m o m o m o n n n • Apriori – incremental approach [17] • FSG [2] • Generate a (k+1)-edge candidate subgraph by combining two k-edge subgraphs where these two k-edge subgraphs have a common core subgraphof (k-1)-edges. • Cost of comparison between subgraphs (and core subgraphs) is reduced using hash-code of each subgraph object.

Subsequent Candidate Generation q l p q l p p o m o o m n m n n z t r o l l l m p p p n p o o o m m m o m n n n r n l p o m n l p p s s o o m m n n • Instance: • Number of 5-edge subgraphs: 21 • Core subgraph comparisons for s1: 20 Not generated …………………………………………. ………………………………................ ………………………………………….

Master Pathway Graph (MPG) Total Unique Nodes:1205 Total Relations:1376

SEG - Subgraph Extension Generation q l p n m m o s o n p l l q r p r l m p o m o n n l p s m o n • Neighborhood Extension • Neighborhood list : {q, r, s} • Comparison is not required. • Subgraph is extended from physical evidence

Subgraph Discovery • What so novel about pruning edges? min_sup=2%

‘Importance Factor’ of a subgraph: sfipf • For i-th subgraph j-th pathway: Subgraph frequency, Inverse pathway frequency,

Dataset Properties (sfipf) Number of edges in MPG=1376 Total pathways=50

Subgraph Discovery

Subgraph Discovery Overall attempts saved = 89.52% Overall time saved = 99.39%

Clustering • Hierarchical Agglomerative Clustering (HAC) • k-means • Unsupervised measure of clusters’ validity • Average Silhouette Coefficient (ASC) [19]

Clustering

Pathway Relations (StoryTelling) p1 p7 p2 p8 S T p3 p9 • Bidirectional Search • Cover tree for NN

Day-to-day life example From Roman Holiday From Terminator 3 From: Roman Holiday To: Terminator 3

Examples in STKE • http://people.cs.vt.edu/msh/infoviz/3/

Pathway Relations (StoryTelling)

Future Directions • Compare our SEG graph methods with text based clustering and storytelling • Examine costs and benefits for combining text and graph mining techniques

References [1] Science Signaling, The signal Transduction Knowledge Environment (STKE), "The Database of Cell Signaling", http://stke.sciencemag.org/cm/ [2] Kuramochi, M. and Karypis, G., "An efficient algorithm for discovering frequent subgraphs", IEEE Transactions on KDE, Vol. 16(9), September 2004, pp. 1038-1051. [3] Breslin, T., Krogh, M., Peterson, C., and Troein, C., "Signal transduction pathway profiling of individual tumor samples", BMC Bioinformatics, June 29, 2005. [4] Kumar, D., Ramakrishnan, N., Helm, R. F., and Potts, M., "Algorithms for Storytelling", IEEE Transactions on KDE, Vol. 20(6), June 2008, pp. 736-751. [5] Ratprasartporn, N., Cakmak, A., and Ozsoyoglu, G., "On Data and Visualization Models for Signaling Pathways", 18th SSDBM, 2006, pp. 133-142. [6] Xu, X., and Yu, Y., "Modeling and Verifying WNT Signaling Pathway", 3rd Intl. Conf. on ICNC. 2007, Vol. 2, pp. 319 - 323. [7] Schreiber, F., "Comparison of metabolic pathways using constraint graph drawing", 1st Asia-Pacific bioinformatics Conf. on Bioinfo., Australia, Vol. 19, 2003, pp. 105 - 110. [8] Abello, J., van Ham, F., and Krishnan, N., "ASKGraphView: A Large Scale Graph Visualization System", IEEE Transactions on Visualization and Computer Graphics, Vol. 12(5), 2006, pp. 669 - 676. [9] Miyake, S., Tohsato, A., Takenaka, Y., and Matsuda, H. "A clustering method for comparative analysis between genomes and pathways", 8th Intl. Conf. on Database Systems for Advanced Applications, March 2003 pp. 327 - 334.

References [10] Yan, X., and Han, J. "gSpan: graph-based substructure pattern mining", IEEE ICDM, 2002, pp. 721-724. [11] Moti, C., and Ehud, G. "Diagonally Subgraphs Pattern Mining", 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2004, pp. 51-58. [12] Ketkar, N., Holder, L., Cook, D., Shah, R., and Coble, J. "Subdue: Compression-based Frequent Pattern Discovery in Graph Data", ACM KDD Workshop on Open-Source Data Mining, August 2005, pp. 71-76. [13] Zhang, T., Ramakrishnan, R., and Livny, M., "BIRCH: An Efficient Data Clustering Method for Very Large Databases", ACM SIGMOD Intl. Conf. on Management of Data, Canada, 1996, pp. 103-114. [14] Wagsta, K., Cardie, C., Rogers, S., and Schroedl, S., "Constrained K-means Clustering with Background Knowledge", ICML 2001, pp. 577-584. [15] Lin, F., and Hsueh, C. M., "Knowledge map creation and maintenance for virtual communities of practice", Intl. Journal of Information Processing and Management, ACM, Vol. 42(2), 2006, pp. 551-568. [16] Beygelzimer, A., Kakade, S., Langford, J., "Cover trees for nearest neighbor", ICML 2006, pp. 97-104. [17] Agrawal, R., and Srikant, R. "Fast Algorithms for Mining Association Rules", Intl. Conf. on Very Large Data Bases, Santiago, Chile, September 1994, pp. 487-499. [18] Agrawal, R., Mehta, M., Shafer, J., Srikant, R., Arning, A. and Bollinger, T. "The Quest Data Mining System", KDD'96, USA, 1996, pp. 244-249. [19] Tan, P. N., Steinbachm, M., and Kumar, V., "Introduction to Data Mining", Addison-Wesley, ISBN: 0321321367, April 2005, pp. 539-547. [20] http://people.cs.vt.edu/amonika/infoviz/

Thank You

Storytelling and Clustering for Cellular Signaling Pathways

Storytelling and Clustering for Cellular Signaling Pathways

Presentation Transcript

Journal Club: Liver Signaling Pathways

Liver Signaling Pathways

Signaling Pathways

Cellular Automata Modeling of Signaling and Metabolic Pathways

Signaling pathways in hepatocellular carcinoma

Cytokines and signaling pathways in healthy and disease

Protein Kinase Signaling Pathways

Alliance for Cellular Signaling (AfCS)

Modeling of Calcium Signaling Pathways

Cell Signaling II Signal Transduction pathways

Cellular Signaling

2014 NSF-CMACS Workshop on Cellular Signaling Pathways

Intracellular Signaling Pathways Activated by Endothelins

Figure 2 TLR signaling pathways

Energy Releasing Pathways ( Cellular Respiration )

Figure 3 Stromal-epithelial signaling pathways

Cellular Pathways

Signaling Mechanisms, Cellular Adhesion, and More diseases

Chapter 47: Endocrine System and Cellular Signaling

SIGNALING PATHWAYS IN CANCER .

MAP Kinase Signaling Pathways