Modularity and Community Structure in Networks*

Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006

Introduction

Networks • A network: presented by a graph G(V,E):V = nodes, E = edges (link node pairs) • Examples of real-life networks: • social networks (V = people) • World Wide Web (V= webpages) • protein-protein interaction networks (V = proteins)

Protein-protein Interaction Networks • Nodes – proteins (6K), edges – interactions (15K). • Reflect the cell’s machinery and signaling pathways.

Communities (clusters) in a network • A community (cluster) is a densely connected group of vertices, with only sparser connections to other groups.

Searching for communities in a network • There are numerous algorithms with different "target-functions": • "Homogenity" - dense connectivity clusters • "Separation"- graph partitioning, min-cut approach • Clustering is important for understanding the structure of the network • Provides an overview of the network

Distilling Modules from Networks Motivation: identifying protein complexes responsible for certain functions in the cell

Newman's network division algorithm http://www.pnas.org/content/103/23/8577.full

Important features of Newman's clustering algorithm • The number and size of the clusters are determined by the algorithm • Attempts to find a division that maximizes a modularity score Q • heuristic algorithm • Notifies when the network is non-modular

Overview of the algorithm

Spectral 2-division algorithm • Input: adjacency matrix A (n vertices) • Output: a (1)-vector of size n representing the 2-division • "-1" cluster (vertices whose corresponding entry is -1) and "+1" cluster (vertices whose corresponding entry is +1) • Build a modularity matrix B from A • Compute the leading eigen-pair (u1, 1) of B • u1 is the eigen-vector (size n), 1is the eigen-value.leading eigen-pair: Bu1 = 1u1.1 is the maximal eigen value • If (1== 0) => the network is indivisible • Else (heuristic...) • Transform u1into vector (1)-vectors • Q = sTBs • if (Q > 0) return s, else return (+1,....,+1)

Dividing into more than 2 • How to compute into more than 2? • Idea: apply the algorithm recursively* on every group. • The algorithm should be generalized for a 2-division of a group in the network

Newman's clustering algorithm • P* = {{1,....,n}} (*singleton nodes should be removed) • For each group g in P • Remove g from P • Perform a spectral 2-division on g • if g is divisible - improve the 2-division by additional heuristic. • Add* each subgroup in the 2-division to P * if the subgroup has more than one element, and is different from g.

Modularity and Community Structure in Networks*