330 likes | 346 Views
This research paper focuses on efficient triangle motif counting in large scale complex networks using GPUs. It explores the importance of triangle counting and its applications in various fields. The paper also presents a methodology for indexing and counting star structures in the network.
E N D
Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs Hakan Kardeş CS 791v
Introduction • Many systems are being modeled as complex networks to understand local and global characteristics of these systems. • Studying network models of these systems provides a new direction towards understanding biological, chemical, technological or social systems in a better way. CS 791v
Complex Networks Everywhere Aspirin Yeast protein interaction network An Internet Web Co-author network CS 791v
Why Graph Mining and Searching? • In many cases, systems under investigation are very large and the corresponding graphs have large number of nodes/edges requiring graph mining techniques to derive information from the graph. • Several graph mining techniques have been developed to extract useful information from graph representation and analyze various features of complex networks. CS 791v
Why is Triangle Counting important? A C B [WF94)] • Hidden Thematic Structure of the Web (Eckmann et al. PNAS [EM02]) • Motif Detection, (e.g., [YPSB05] ) • Web Spam Detection (Becchetti et.al. KDD ’08 [BBCG08]) Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of friends are friends” CS 791v
Related Work • HakanKardes, and M. H. Gunes. Structural Graph Indexing for Mining Complex Networks. IEEE ICDCS 2010 Workshop on Simplifying Complex Networks for Practitioners, Genoa, ITALY, June 21 2010. • Our paper in which we count all star, triangle, complete bipartite and clique structures. • MatthieuLatapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1-3 (November 2008), 458-473. • Survey paper, focused on space complexity • CharalamposTsourakakis, PetrosDrineas, EirinaiosMichelakis, IoannisKoutis, Christos Faloutsos, "Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification," Social Network Analysis and Mining, International Conference on Advances in, pp. 66-71, 2009 International Conference on Advances in Social Network Analysis and Mining, 2009 • relies on the spectral properties of power-law networks, focused on power-law networks CS 791v
Related Work • Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting.ACM Transactions on Knowl. Discov. Data 4, 3, Article 13 (October 2010), 28 pages. • They count the number of triangles for a given node. • Charalampos E. Tsourakakis, U Kang, Gary L. Miller, and Christos Faloutsos. Doulion: Counting triangles in massive graphs with a coin.InKnowledge Discovery and Data Mining (KDD '09) • Belkacem Serrour, Alex Arenas, Sergio Gomez. 2010. Detecting communities of triangles in complex networks using spectral optimization. • Bill Andreopoulos, Christof Winter, Dirk Labudde and Michael Schroeder. Triangle network motifs predict complexes by complementing high-error interactomes with structural information. BMC Bioinformatics 2009 CS 791v
Star • We first index the star structure where a node has multiple neighbors as shown in below figures. • All star structures within a graph G = (V,E) are represented as s(vi , nsi) where vi ∈ V and nsi is the set of all neighbors of vi. • We index maximal star structures for each node. ns1 ns1 ns1 v1 v1 ns2 ns2 vi ns2 ns3 . . . . nsn CS 791v
Star Algorithm: • First build a star structure s(v,ø) for each node v ∈ V, without any neighbors. • Then, for each edge e(a, b) ∈ E, append neighbor sets of nodes a and b to the other one. • Finally, remove star structures s(v,ns) that have less than two neighbors. Nodes: a, b, c, d, e, f. Edges: (a,b), (a,d), (a,f), (b,e), (c,f), (d,f) Star Structures: a f b c b a f e d a b a d f e d c f CS 791v
Triangle Algorithm: • Find second hop neighbors of ‘a’ by iterating over the ns set • Then, take the intersection of second hop neighbors of ‘a’ and ns set. • Grow the triangle set for each isi ԑ is. ns1 ns2 a . . . . ls1 nsn ls2 . . . . lsn CS 791v
CUDA • For the parallel algorithm, I will use CUDA. CS 791v
CUDA CS 791v
Possible Datasets for Experiments • Router-level Internet topology (around 2.3 M nodes and 4M edges) • http://cheleby.cse.unr.edu/data.html • the routing data on the Internet network (around 124K nodes and 207K edges) • http://vlado.fmf.uni-lj.si/pub/networks/data/web/web.zip • a mobile phone graph. (around 2.7M nodes and 6M edges) • Will be requested from the authors of “Structure of neighborhoods in a large social network” • Biological Data • http://www.biomedcentral.com/1471-2105/10/196 • Wikipedia graph (around 1.6M nodes and 18.5M edges) • I haven’t decided how to do it yet. • I will generate sample graphs with different number of triangles • I haven’t decided how to do it yet. CS 791v
Results • Triangle Counting CPU vs. GPU: Execution Time no. of nodes CS 791v
Results • Triangle Counting CPU vs. GPU: Execution Time no. of edges(while no. of nodes is constant CS 791v
Results • Triangle Counting with different triangle sizes: Execution Time No. of triangles CS 791v
Results • Triangle Counting with different block sizes: Execution Time Block Size CS 791v
Structural Graph Indexing(SGI) • We propose an alternative structural indexing approach to search and process queries efficiently even in very large graphs. • As indexing features, we use commonly observed graph structures: star, complete bipartite, triangle and clique. • These structures are ubiquitous in biological, chemical, technological, social, and many other complex networks. CS 791v
Structural Models d1 d1 d1 v1 v2 d1 v1 v2 v1 v1 d1 v1 d2 . . . . . . d2 . . . v2 d2 . . . v1 v1 v3 d3 vn d2 v1 . . . . v3 2*3-Complete Bipartite(K2,3) 3*4-Complete Bipartite(K3,4) 3-Star (K1,3) n-Star (K1,n) . v2 vm . . d4 . dn d3 d3 . dn v4 v4 v3 v3 v2 m*n-Complete Bipartite(Km,n) Triangle(K3) 4-Clique (K4) n-Clique(Kn) CS 791v
Structural Graph Indexing • An important feature of these structures is that each one is comprised from the previous one where clique contains complete bipartite structures and complete bipartite contains star structures. • So, we can index these structures within the original graph in a consecutive manner. We first identify star structures, and then the complete-bipartite and clique structures from the preceding ones. CS 791v
Structural Graph Indexing • An important difference of our approach from the previous studies is that we does not limit the size of subgraph considered in indexing. We index all maximal graphs that match the structure formulation. For instance, a maximal clique is a clique that cannot be extended by adding one more vertex from the graph. However, the substructure size in indexing may be limited when needed, since maximal clique search is known to be NP-complete. CS 791v
Complete Bipartite d1 d1 d1 • The second structure we index is complete bipartite, shown in below figures. • A complete bipartite graph G = (V1 ∪ V2,E) is a bipartite graph such that V1 and V2 are two distinct sets and for any two vertices vi ∈ V1 and vj ∈ V2, then there is an edge between them (i.e., ∃ e∗ (vi,vj ) ∈ E). v1 v1 v1 . . . . . . . d2 . . v2 d2 . . . . d3 v3 v2 vm d4 d3 dn SIMPLEX’10 CS 791v
Complete Bipartite • Complete bipartite structure is ubiquitous in many complex networks. • protein-protein interaction networks (Thomas et. al.) • the Internet (Fay et. al.) • We index all complete bipartite structures in the graph G using indexed star structures. • For each star structure s(a,ns) where a ∈ V and ns is the neighbor set of the node a, we identify the maximal complete bipartite involving the node ‘a’. CS 791v
Complete Bipartite Algorithm: • Find second hop neighbors of ‘a’ by iterating over the ns set and unifying them under Lcan set that indicates candidates for the left side of the complete bipartite while the ns set is the candidate set for the right hand side. • Then, find a K2,n and then grow it to Km,n. In finding K2,n , iterate over each candidate node in the Lcan and determine the neighbor intersection with a. If the intersection set is larger than two, then these nodes belong to the right hand side. • Grow the K2,n by finding all nodes in the left hand side (i.e., Lcan) that has the right hand side nodes (i.e., Rnew) as a neighbor. ns1 Right can. set ns2 a . . . . ls1 Left can. set nsn ls2 . . . . lsn CS 791v
Clique • Finally, we index clique structures shown in below figures. • A clique in graph G = (V,E) is a subset of the vertex set (i.e., C ⊆ V ) such that there are edges between all node pairs (i.e., ∀(ci, cj) ∈ C, ∃e(ci,cj) ∈ E, when i ≠ j). • We index all maximal clique structures in the graph using complete bipartite structures. v1 v2 v1 v1 v2 v3 vn . v3 v2 v3 v4 . . v4 . . . CS 791v
Clique • This structure has been observed and utilized in many fields. • computational biology • protein structure prediction (Samudrala et. al.) • electronic circuits (Cong et. al.) • chemicals in a chemical database (Rhodes et. al.) CS 791v
Clique d1 Algorithm: • First get the set of nodes from each complete bipartite k(m,n) and look for cliques that are formed by those nodes. • The clique search algorithm works recursively on each node from the k(m,n) as the pivot node in the L1 set and considers other nodes as candidate nodes in the L2 set. • The function, moves each node from the L2 set to the L1 set if it is connected to all nodes in the L1 and then recursively tries to grow the structure with remaining nodes as candidates. • When there are no more candidates to consider in L2 set then a clique has been identified. Set1 v1 d2 • v1 v2 d3 v3 Set2 d4 • d1 • d2 • d3 • d4 • v2 • v3 CS 791v
Where to Submit • Advances in Social Network Analysis and Mining (ASONAM 2011): • Full paper submission deadline is March 1, 2011. • Full paper manuscripts must be with a maximum length of 8 pages (using the IEEE two- column template). • Kaohsiung, Taiwan7/25-7/27 • Workshop on Large-scale Data Mining: Theory and Applications (LDMTA 2011) • Workshop on Mining and Learning with Graphs (MLG 2011) • Workshop on Social Network Mining and Analysis (SNAKDD 2011) • Full paper submission deadline is May 4-10, 2011. • Full paper manuscripts must be with a maximum length of 10 pages (using the ACM two- column template). • San Diego, CA 8/21-8/24 • Simplifying Network Science for Practitioners: (SIMPLEX 2011) • Full paper submission deadline is Jan 31, 2011 – Feb 19 2011. • Full paper manuscripts must be with a maximum length of 6-10 pages (using the IEEE two- column template). • Minneapolis, Minnesota, USA 6/20-6/24 CS 791v
Questions SIMPLEX’10