200 likes | 221 Views
Explore methods to identify network motifs (>15 nodes) in biological networks, proposed clustering solution, unsolved issues, and compact representation for motif discovery.
E N D
DISCOVERING LARGER NETWORK MOTIFS Wooyoung Kim and Li Chen 4/24/2009 CSC 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan
OUTLINE • Project Topic • Related Works • Proposed Ideas • Unsolved Problems
PROJECT TOPIC • Discovering Larger Network Motifs • Given a biological network (PPI, transcriptional regulatory network, gene network, etc), find network motifs whose size is large (>15)
RELATED WORKS (1) • Network Motif Discovery using subgraph enumeration and symmetry breaking • motif size <=15 • Given a candidate subgraph, find all symmetry subgraphs in the graph, then evaluate it by checking the frequency. • Problem: How to find candidate subgraph? Proposed solution: Cluster the whole network and find the representation at each cluster to claim that as candidate subgraphs.
RELATED WORKS (2) • Motif Discovery Algorithm • Exact algorithm on motifs with a small number of nodes 1. Exhaustive Recursive Search (ERS): (motif size <= 4) 2. ESU: starting with individual nodes and adding one node at a time until the required size k is reached. (motif size <=14) 3. Compact Topological Motifs
RELATED WORKS (3) • Approximate Algorithms • Search Algorithm Based on Sampling (MFINDER) • Rand-ESU • NeMoFINDER • Sub-graph Counting by Scalar Computation • A-priori-based Motif Detection
RELATED WORKS (4) • Network Clustering • Compact representation of network. • Type I: minimum number of clusters • Type II: maximum cohesiveness • Aggregation of topological motifs (combining smaller network motifs to observe the whole structure) However, in our proposed solution, the clustering task is grouping similar network patterns together, not grouping similar nodes (sequence) together. Nor it is not used for aggregating motifs.
PROPOSED IDEAS Given a graph G = (V,E), and t (the size of desirable motif) and k (the number of motifs), find a network motif with size t. • List all graph patterns with t (or larger than t) nodes. • Represent the network as an adjacency matrix A (1, -1, 0) • Scan A for all t x t sub-matrices • Cluster the subgraphs into k clusters • Use any numerical clustering algorithms including K-means, NMF, etc. • Find a subgraph representation at each cluster. • Use the symmetry breaking technique to find the representation. • Each representation can be a candidate of network motif.
UNSOLVED PROBLEMS • How to cluster the graphs? • The clustering algorithms to apply will be various based on what features we are using for the data. • What type of clustering algorithm? Type I or type II? • How to find the representation subgraph of each cluster? • Should we consider network alignment first? • Should we consider the sequence similarities as well? • Will there be any relationship between sequence motif and network motif? • Applying the sequence motif into vertex attributes matrix? compact topological motifs. • Large network motif vs. small network motif
COMPACT NOTATION • Main Idea A topological motif can be represented either as a motif or as a collection of location lists of the vertices of the motif. It works in the space of the location lists so as to discover motif.
COMPACT NOTATION • Method • Step1: compute an exhaustive list of potential lists of vertices of motifs as compact location lists • Step 2: enlarge the collection of compact location lists computed in the first step by including all the non-empty intersections, along with the differences.
COMPACT NOTATION • An Example Different color indicate different attribute
COMPACT NOTATION • G1’s adjacency matrices
COMPACT NOTATION • Adjacency Matrix B1 (the conjugacy relationship of two lists is shown by “”) • L = {ℓ1, ℓ2, ℓ3, ℓ4}
COMPACT NOTATION • Initialization Step
COMPACT NOTATION • Iterative Step
REFERENCES • [1] Bill Andreopoulos, Aijun An, Xiaogang Wang, and Michael Schroeder. A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform, pages bbn058+, February 2009. • [2] Alberto Apostolico, Matteo Comin, and Laxmi Parida". Bridging Lossy and Lossless Compression by Motif Pattern Discovery. Electronic Notes in Discrete Mathematics, 21:219 - 225, 2005. General Theory of Information Transfer and Combinatorics. • [3] Giovanni Ciriello and Concettina Guerra. A review on models and algorithms for motif discovery in protein-protein interaction networks. Brief Funct Genomic Proteomic, 7(2):147-156, 2008. • [4] Jun Huan, Wei Wang, and Jan Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Data Mining, IEEE International Conference on, 0:549, 2003. • [5] Michihiro Kuramochi and George Karypis. Finding Frequent Patterns in a Large Sparse Graph. Data Mining and Knowledge Discovery, 11(3):243-271, November 2005. • [6] Laxmi Parida. Discovering Topological Motifs Using a Compact Notation. Journal of Computational Biology, 14(3):300-323, 2007.
REFERENCES • [7] Radu Dobrin, Qasim K. Beg, Albert-Laszlo Barabasi, and Zoltan N. Oltvai. Aggregation of topological motifs in the escherichia coli transcriptional regulatory network. BMC Bioinformatics, 5:10, 2004. • [8] McKay, B.D. Isomorph-free exhaustive generation. J. Algorithms, 26:306-324, 1998 • [9] Middendorf, M., Zive, E., and Wiggins, C.H. Inferring network mechanisms: the Drosophila melanogaster protein interaction network. PNAS, 102 (9):3192-3197, Mar 2005. • [10]Grochow, J. A. and Kellis, M. Network motif discovery using subgraph enumeration and symmetry-breaking. In RECOMB 2007, Lecture Notes in Computer Science 4453, pp. 92-106. Springer-Verlag, 2007.