Clustering Social Networks

Clustering Social Networks Isabelle Stanton, University of Virginia Master of Science Thesis Defense

Outline • Motivation • Previous Work • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Group Recommendations • Future Work

Motivation • Many large social networks: • A fundamental problem is finding communities automatically • Viral and Targeted Marketing • Recommendation Engines

Previous Work – Spectral Methods • Cuts the graph based on an eigenvector • Spectral Methods: • Kannan, Vempala, Vetta 2000, Spielman and Teng 1996, Shi and Malik 2000, Kempe and McSherry 2004, Karypis and Kumar 1998 and many others • cut = partitioning of all elements

Communities in Social Networks • Disjoint partitionings are not good for social networks

Objective: Internal Density,  Each vertex in C is adjacent to at least  fraction of (the rest of) C Examples: =1/2 =3/4 =1

Objective: External Sparsity,  Each vertex outside of C is adjacent to at most  of C  <  =1/5, =1 =1

(α, β)-Clusters • C is an (α, β)- cluster if: • Internally Dense: Every vertex in the cluster neighbors at least a β fraction of the cluster • Externally Sparse: Every vertex outside the cluster neighbors at most an α fraction of the cluster (1/4, 2/3) (1/4, 1)

Contributions of this work • Definition of criterion • Combinatorial results • 3 overlap results • Bound on number of (α,1) clusters • Three algorithms for varying cases • Experiments validating assumptions on real social networks • Novel formulation of group recommendation problem with experiments

Previous Work – (α, β)-clusters • Solved Areas: Our Contributions: 1 (1- ε,1) – Tsukiyama et al, Johnson et al. α = 0 – connected components α β > ½ + α/2 – Algorithm 1 and 2 0 α < β3 – Algorithm 3 1 0 β

Outline • Motivation • Previous Work • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Group Recommendations • Future Work

Too Many Clusters.. n vertices MISSING edges drawn x1 y1 x2 y2 ... xn/2 yn/2 Problem:Every vertex in every cluster has as many neighbors outside the cluster as in it

ρ-Champions Ben Stiller Gwenyth Paltrow Will Ferrell Vince Vaughn Wes Anderson Owen Wilson ρ-champion Steve Martin Bill Murray Anjelica Houston

ρ-Champions • Def: A vertex is a ρ-champion of C if it has at most ρ|C| neighbors outside C • Claim: If ρ < 2β – 1 – α, every vertex can ρ-champion at most one cluster

Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Deterministic Algorithm • Finding Loosely Knit Clusters • Group Recommendations • Future Work

Let c be a ρ-champion If v in C, then v and c share at least (2β -1)|C| neighbors If v is outside C then v and c share at most (ρ + α)|C| neighbors Intuition behind the Algorithm v α|C| β|C| v c ρ|C| β|C| (2β-1)|C| c

Deterministic Algorithm • To find all clusters of size s: • for each c in V do • C←  • For each v within two steps of c do • If v and c share (2β – 1)s neighbors then add v to C • If C is an (α, β)-cluster then output C

Algorithmic Guarantees • Claim: Our algorithm will find all clusters of size s where β > ½ + (ρ + α)/2 • Runs in O(d0.7n1.9+n2+o(1)) time where d is the average degree • d is a small constant for social networks so O(n2)

Evaluation • Do ρ-champions exist in real graphs? • Tsukiyama’s algorithm finds all maximal cliques ((1-ε, 1)-clusters) in a graph • We compare our algorithm’s output with Tsukiyama’s ground truth

Theory Co-Author Dataset Results • Found 797 of 854 clusters ~ 93%

Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Technical Challenges • Randomized Algorithm • Group Recommendations • Future Work

Loosely Knit Clusters • β≤ ½ • Technical Problem: (0, 1/2)

Connectivity Assumption • Every subset of a cluster has an outside vertex in the cluster that neighbors more than a β-fraction Does satisfy assumption! (β = 2/7) Doesn’t satisfy assumption

Loosely Knit Randomized Algorithm • α < β3 • Two phases • Phase 1: • Draw a sample of the ρ-champion’s neighbors • Sample neighbors to add to the seed • Stop when the seed is “big enough” • Phase 2: • Exploit connectivity assumption to deterministically grow the seed into the cluster

Example • Phase 1: • Sample of the ρ-champion’s neighbors • Sample neighbors to add to the sample • Stop when the sample is “big enough” • Phase 2 • Deterministically grow cluster

Why does this work? • Random sampling guarantees the expected number of neighbors an outside vertex has with the seed is small • The connectivity assumption guarantees we’ll always make progress • Guarantees: Finds all clusters where α < β3 with probability 1 – δ • Runs in time O(n3/δ log(n/δ) |C|2)

Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Group Recommendations • Future Work

Group Recommendations • Clustering isn’t the end goal • What can we do with (α,β)-clusters? • We built a group recommendation engine powered by our clusters • Recommended groups to users of Orkut and LiveJournal

Recommendation Model • Hofmann and Puzicha ‘99 5 5/60 25/60 20 2/3 25 60 people 3/4 1/3 15 10 1/2 10 20 people

Previous Work • Kleinberg and Sandler: Given the Groups x User matrix, use matrix decomposition • Their code works ~ 100K variables max • No one uses the friendship graph or clusters!

Experimental Setup • Hold out 10% of users with group memberships • Cluster the rest • Create recommendations for held out users based on clusters

Results – LiveJournal Dataset Held out: 355,495 users – Succeeded on: 210,455

Results – LiveJournal Dataset

Conclusions • Defined (α, β)-clusters • Focus: Overlapping clusters • Introduced ρ-champions • Developed algorithms for a subset of the problem • Ran experiments to validate assumptions and show utility of the clusters • Introduced new interpretation of the recommendation model

Future Work • Algorithms that reduce the necessary α-β gap • Relaxing ρ-champion restriction • Weighted and directed graphs • Decentralized algorithms • Streaming algorithms • Expanding work on group recommendations

Citations • Clustering Social Networks, N. Mishra, R. Schreiber, I. Stanton and R. E. Tarjan, The 5th Workshop on Algorithms and Models for the Web-Graph, WAW2007. LNCS, vol 4863, pp. 56-67. • Clustering Social Networks,N.Mishra, R. Schreiber, I. Stanton and R. E. Tarjan, Journal of Internet Mathematics (under submission)

NewKid Algorithm • Input: Graph, Groups, (α,β)-clusters • For each group g and cluster c: • P(g|c) = |members of c in g| / | members in c| • For each new kid, u: • P(c|u) = |friends of u in c| / |friends of u| • Recommend g that maximizes Σc p(g|c)P(c|u)

Results – Orkut Dataset

HEP Co-Author Dataset Results • Found 115 of 126 clusters ~ 90%

LiveJournal Dataset Results • Too big to run Tsukiyama. Found 4289 clusters, 876 have large ρ-champions

Datasets • High Energy Physics Co-Authorship Graph • Theory Co-authorship graph • A subset of LiveJournal.com τ(v) = the neighbors and neighbors’ neighbors of v

Randomized Algorithm • To find all (α, β)-clusters of size s: • for each c in V do: • Repeat k times: • Draw a random sample S of size t from c’s neighbors • C← S U {c} • For each v within two steps of c do • If v has (2β – 1)/ β t neighbors in S then add v to C • If C is an (α, β)-cluster then output C

Randomized Algorithm • t = O( log(n / δ) ), k = O( n / δ ) • Guarantees: Finds all clusters where α < 2β – 1 with probability 1 – δ • Runs in time O(n3/δ log(n/δ) (log(n/δ)+|C|)) • Worst case: O(n4/δ log(n/δ)) • Average case: O(n2/δ log2(n/δ) d2)

Combinatorial Properties - Overlaps • Let A and B be (α, β)-clusters with |A|=|B| • Theorem: A and B overlap by at most (1-(β-α))|A| vertices 1 0 0 1

Combinatorial Properties - |Clusters| • Claim: There are at most (α,1)-clusters of size s in a graph • Bound is tight as α→ 1 and α = 0. Seems loose elsewhere • Proof is from Steiner Systems • 7 points, block size = 3, restriction = 2 • {1,2,4},{2,3,5},{3,4,6},{4,5,7},{1,5,6},{2,6,7},{1,3,7}

Outline • Motivation • Previous Work • Combinatorial properties • Finding Tightly Knit Clusters • Finding Loosely Knit Clusters • Experiments and Group Recommendations • Are ρ-champions valid? • What are these clusters good for? • Future Work

Clustering Social Networks

Clustering Social Networks

Presentation Transcript

Social Networks

Neural Networks and Clustering

Communities and Clustering in some Social Networks

Clustering Social Networks (with groups!)

SOCIAL NETWORKS

Clustering Social Networks

Presentation: Genetic clustering of social networks using random walks

Social Networks

Social Networks

Social Networks

Social Networks

Clustering in Sensor Networks

“ Social Networks”

Social Networks

Social Networks

Social Networks

Social Networks

Social Networks

Social Networks

Social Networks

Social Networks