120 likes | 287 Views
Programming Languages. Clustering Social Networks (with groups!). Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan. Outline. Motivation Group Recommendations ρ -champions A Clustering algorithm The N EW K ID Algorithm
E N D
Programming Languages Clustering Social Networks (with groups!) Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan
Outline • Motivation • Group Recommendations • ρ-champions • A Clustering algorithm • The NEWKID Algorithm • Evaluation of the NEWKID algorithm
Motivation • Many large social networks: • A fundamental problem is finding communities automatically • Social networks have millions of groups • Which ones should you join?
Group Recommendations • Model by Kleinberg and Puzicha • Assumes a latent clustering • Recommend group, g, that maximizes: • I guess we better figure out how to find these clusters
Communities in Social Networks • Disjoint partitionings are not good for social networks
(α, β)-Clusters • C is an (α, β)- cluster if: • Internally Dense: Every vertex in the cluster neighbors at least a β fraction of the cluster • Externally Sparse: Every vertex outside the cluster neighbors at most an α fraction of the cluster (1/4, 3/4) (1/4, 1)
ρ-Champions Wes Anderson
Let c be a ρ-champion If we sample c’s neighbors, we’re likely to be in the cluster If v is outside C then v and c share at most (ρ + α)|C| neighbors Intuition behind the Algorithm v α|C| β|C| v β|C| c c ρ|C| β|C| (2β-1)|C|
Algorithm • Input: α, β, G, k, t • Output: All (α, β) clusters w/ ρ-champions and • for each c in V do • Draw a sample of size t,k times • For each sample, add vertices that have ‘a lot’ of neighbors in the cluster • ‘a lot’ depends on how close our sample is to being the right size • When no more vertices can be added check if we have an (α, β)-cluster
The NEWKID Algorithm • Input: G, groups H, members of V in each h • C is the set of (α, β) clusters in G • For each group, • While there is a new kid joining the network do: • For each c in C, • Output a ranking of groups scored by:
LJ – 5 M Users, 7.5 M Groups Random: 2 right per rec Orkut – 3.1 M Users, 8.7 M Groups Random: 4 right per rec Experimental Results