1 / 26

A scalable multilevel algorithm for community structure detection

A scalable multilevel algorithm for community structure detection. Melih Onus Hristo Djidjev Arizona State University Los Alamos National Laboratory. Models and Algorithms for the Web Graph (WAW 2006) November 29 – December 2, 2006. Community Structure Detection Problem.

robinettec
Download Presentation

A scalable multilevel algorithm for community structure detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A scalable multilevel algorithm for community structure detection Melih Onus Hristo Djidjev Arizona State University Los Alamos National Laboratory Models and Algorithms for the Web Graph (WAW 2006) November 29 – December 2, 2006

  2. Community Structure Detection Problem • The problem of identifying communities in a network is usually modeled as a graph clustering problem • Vertices correspond to individual items • Edges describe relationships • The communities correspond to subgraphs • Dense connections between vertices from the same subgraph • Fewer connections between vertices in different subgraphs

  3. Motivation: Why to detect communities? • Analyze and understand the information contained in the huge amount of data available on the WWW • Finding related commercial items • Recommendation systems • Important for • Social networks • Ad-hoc networks • Protein interaction networks • Genetic networks

  4. Motivation: Why to detect communities? Predict how much someone going to love a movie based on their movie preferences Grand Prize $1.000.000

  5. Outline of the talk • Previous work • Graph partitioning problem • Our approach • Modularity • Reduction • Multilevel graph partitioning • Experimental results • Conclusions

  6. Previous Work • Two main classes • Agglomerative Methods (addition of edges) • Divisive Methods (removal of edges) • Algorithms based on • Laplacian Matrix • Centrality measures • Flow models • Random walks • Resistor networks • Optimization • Not fast enough or inaccurate

  7. Graph Partitioning Problem • Given a graph G(V, E), find a partition such that • The partition is balanced (i.e., the number of vertices of all subsets are roughly equal) • Cut size is minimized (i.e., the number of the edges with endpoints in different subsets is minimized) • Previous Work: • Kernighan-Lin algorithm • Spectral partitioning • Multilevel algorithms

  8. u v Kernighan - Lin Algorithm • Find an initial random partition • Improve by a greedy procedure that swaps pairs of vertices from different partitions • Minimize the size of the cut set u v

  9. Graph Partitioning vs Graph Clustering • Find Clusters • Community sizes may differ • Number of subsets varies • Minimize cut size • Equal number of vertices in each subset • Number of subsets is an input • Algorithms for graph partitioning can not be directly used to produce good quality clustering

  10. Our approach • Convert original graph G into a complete graph G’ • Find min-cut of G’ using modified graph partitioning method • This will produce a good quality (high modularity) clustering for G

  11. Modularity • A useful measure of clustering quality • Introduced by Newman [6] • Modularity of a partitioning = (number of edges within communities) – (expected number of such edges) • We are trying to find a division of graph with high modularity

  12. Reduction Min-Cut Problem: The problem of finding a minimum cut in a complete edge-weighted graph G' Graph Clustering Problem: The problem of finding a clustering of maximum modularity in G

  13. Reduction Graph Clustering Problem: Maximize modularity Maximize modularity of a partitioning = (number of edges within communities) – (expected number of such edges) Minimize (- modularity) = (cut size) – (expected cut size) Min-Cut Problem: Minimize cut size

  14. Random Graph Models pij : the probability that there is an edge between vertices i and j in a random graph from a given distribution Erdos - Renyi Model: Chung - Lu Model:

  15. Multilevel graph partitioning • Fast and an accurate method for producing high-quality partitions • Consists of the three phases: • Coarsening phase • Partitioning phase • Uncoarsening and refinement phase

  16. Coarsening Phase • Find a maximal matching and collapse edges to a vertex • Recursive coarsening: < G = G1, G2, …, Gk >

  17. Partitioning Phase • Greedy graph growing partitioning • Partition Gk

  18. Uncoarsening and Refinement Phase • Project the partitioning Pi of Gi to Pi-1 of Gi-1 • More degrees of freedom at Gi than Gi-1 • Improve Pi using KL algorithm

  19. Implementation • Our implementation is based on the graph partitioning package METIS [3] that employs a multilevel strategy • Convert the graph partitioning algorithm into a clustering one • The optimal clustering might not be balanced. We ignore the restrictions that control the sizes of the parts. • The number of the parts in the optimal clustering is not known. We employ a recursive bisection procedure. • The original graph G might be sparse, while the transformed one G' is complete. Our algorithm does not explicitly generate G’.

  20. Modularity: Erdos - Renyi Model (- Modularity) = cut size – n1n2p (- Modularity)’ = cut size’ – (n1+1)(n2-1)p n1 n2 Erdos - Renyi Model:

  21. Modularity: Chung - Lu Model (- Modularity) = cut size – w1w2/2m (- Modularity)’ = cut size’ – (w1 + w(v))(w2 - w(v))/2m w1 w2 wi: Sum of degrees in partition i

  22. Analysis • Time Complexity: O(n+m) • Experiments • Random Graphs • k-community graphs • nd.edu

  23. Experiment I: Random Graphs • We generated random graphs with 128 vertices and 4 communities of size 32 each • The expected degree of any vertex is 16 • Out degree varies

  24. Experiment II: k-community graphs • We generated graphs with k communities • Size of each community is 100 • Expected number of edges in the community is equal to expected number of edges going outside from community. • Probability of an edge in communities varies between 0.5 and 0.1. • Results show that graphs are clustered especially %99 correctly.

  25. Experiment III: nd.edu • Data consists of the complete map of the nd.edu domain, which contains 325,729 document and 1090108 links • Our algorithm clusters this graph into 280 clusters with modularity 0.925579 • This high modularity indicates strong community structure in the graph • We show the dendrogram generated by our algorithm. • The size of rectangles are proportional to size of communities.

  26. Conclusions • Community structure detection problem • A scalable algorithm • Based on multilevel graph partitioning • Uses modularity as a quality measure

More Related