Dense Subgraph Extraction with Application to Community Detection

Dense Subgraph Extraction with Application to Community Detection Author: Jiechen and YousefSaad IEEE transactions of knowledge and data engineering

Outline • Introduction • Assumption of proposed method • The types of graph • The method • Undirected graph • Directed graph • Bipartite graph • Experiment

Introduction • A challenging problem in the analysis of graph structures is the dense subgraph problem, where given a sparse graph, the objective is to identify a set of meaningful dense subgraphs. • The dense subgraphs are often interpreted as “communities”, based on the basic assumption that a network system consists of a number of communities, among with the connections are much fewer than those inside the same community.

The drawbacks of general partition methods • The number k of partitions is mandatory input parameter, and the partitioning result is sensitive to the change of k. • Most of partition methods yield a complete clustering of the data. • Many graph partitioning techniques favor balancing, i.e., sizes of different partitions should not vary too much.

The assumption of the proposed method by author • The adjacency matrix A is a sparse matrix. • The entries of A are either 0 or 1, since the weights of the edges are not taken into account for the density of a graph. • The diagonal of A is empty, since it does not allow self-loops.

The types of graphs • Undirected graph-G(V,E) • V is the vertex set and E is the edge set. Adjacency matrix-symmetric 1 4 2 5 3 • The definition of the undirected graph density is

The types of graphs • Bipartite graph-G(V,E) • Undirected graph • V is the vertex set and E is the edge set. Adjacency matrix 1 4 B 2 5 BT 3 • The definition of the bipartite graph density is

The types of graphs • Directed graph-G(V,E) • V is the vertex set and E is the edge set. Adjacency matrix-nonsymmetric 1 4 2 5 3 • The definition of the directed graph density is

Undirected graph • We construct a adjacency matrix A with G(V,E). • We use to build matrix M that stores the cosines between any two columns of the adjacency matrix A. • Then we construct a weight graph G’(V,E’) whose weighted adjacency matrix M is defined as M(i,j).

Undirected graph • A top-down hierarchical clustering of the vertex set V is performed by successively deleting the edges e’ ∈ E’, in ascending order of the edge weights. When G’ first becomes disconnected, V is partitioned in two subsets, each of which corresponds to a connected component of G’. • The termination will take place when the density of the partition passes a certain density threshold dmin.

1 2 3 5 4 8 6 7 9

1 6 9 7 2 4 8 d(Gt)=0.6 d(Gs)=1 d(Gt)=0.8 dmin=0.75 3 5

7 8 9 1 2 6 3 5 4

Directed graph • The adjacency matrix A of a directed graph is square but not symmetric. When Algorithm is applied to a nonsymmetric adjacency matrix, it will result in two different dendrograms, depending on whether M is computed as the cosines of the columns of A, or the rows of A. • We symmetrize the matrix A (i.e., replacing A by the pattern matrix of A+AT ) and use the resulting symmetric adjacency matrix to compute the similarity matrix M.

Directed graph transform into undirected graph • Remove the direction of the edges and combine the duplicated resulting edges, then it yields an undirected graph • Or use the adjacency matrix AA+AT 1 1 4 4 2 2 5 5 3 3

Bipartite graphs • Without any edge removal of the graph G’ (using M as the weighted adjacency matrix), the vertex set is already partitioned into two subsets: V1 and V2. • Any subsequent hierarchical partitioning will only further subdivide these two subsets separately.

Augment the bipartite graph • A reasonable strategy for this purpose is to augment the original bipartite graph by adding edges between some of the vertices that are connected by a path of length 2. is obtained by erasing the diagonal of M1.

dmin=0.5

Experiments-accuracy

Experiments-Polblogs • The graph contain 1490 vertices, among which the first 758 are liberal blogs, and remaining 732 are conservative. • The edge in the graph indicates the existence of citation between the two blogs.

Experiments • Comparisons with the Clauset, Newman, and Moore(CNM) approach. • CNM approach: bottom-up hierarchical clustering. • Dataset: foldoc-G(13356, 120238) • It extracted from the online dictionary of computing

Dense Subgraph Extraction with Application to Community Detection

Dense Subgraph Extraction with Application to Community Detection

Presentation Transcript

Relevant Subgraph Extraction

SPARQ2L : Towards Supporting Subgraph Extraction Queries in RDF Databases

FACE DETECTION APPLICATION

Application Intrusion Detection

Amplifying Community Content Creation with Mixed-Initiative Information Extraction

Quench Detection and Energy Extraction Systems

Community Detection

Overlapping community detection

Application 2: Misstatement detection

Network Theory: Community Detection

Edge Detection and Geometric Primitive Extraction

Approximate subgraph matching for detection of topic variations

Frequent Subgraph Mining

Informaton Extraction (Retrieval) in Application to e-Business

Translingual Information Detection, Extraction And Summarization

Statistical Color Models with Application to Skin Detection

Information Extraction for New Event Detection

Application Intrusion Detection

Efficient Activity Detection with Max- Subgraph Search

Feature extraction for change detection

Application Intrusion Detection

Measuring Energy Linkages with the Hypothetical Extraction Method: An application to Spain