260 likes | 412 Views
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research. Statistical properties of network community structure. Network communities. Communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption:
E N D
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research Statistical properties of network community structure
Network communities • Communities: • Sets of nodes with lots of connections inside and few to outside (the rest of the network) • Assumption: • Networks are (hierarchically) composed of communities (modules) Communities, clusters, groups, modules
Community score (quality) • How community like is a set of nodes? • Want a measure that corresponds to intuition • Conductance (normalized cut): • Φ(S) = # edges cut / # edges inside • SmallΦ(S) corresponds to more community-like sets of nodes
Community score (quality) • Score: Φ(S) = # edges cut / # edges inside
Community score (quality) Bad community Φ=5/7 = 0.7 • Score: Φ(S) = # edges cut / # edges inside
Community score (quality) Bad community Φ=5/7 = 0.7 Better community Φ=2/5 = 0.4 • Score: Φ(S) = # edges cut / # edges inside
Community score (quality) Bad community Φ=5/7 = 0.7 Best community Φ=2/8 = 0.25 Better community Φ=2/5 = 0.4 • Score: Φ(S) = # edges cut / # edges inside
Network Community Profile Plot • We define: Network community profile (NCP) plot Plot the score of best community of size k • Search over all subsets of size k and find best: Φ(k=5) = 0.25 • NCP plot is intractable to compute • Use approximation algorithm
NCP plot: Small Social Network • Dolphin social network • Two communities of dolphins NCP plot Network
NCP plot: Zachary’s karate club • Zachary’s university karate club social network • During the study club split into 2 • The split (squares vs. circles) corresponds to cut B Network NCP plot
NCP plot: Network Science • Collaborations between scientists in Networks Network NCP plot
Geometric and Hierarchical graphs Geometric (grid-like) network – Small social networks – Geometric and – Hierarchical network have downward NCP plot Hierarchical network
Our work: Large networks • Previously researchers examined community structure of smallnetworks (~100 nodes) • We examined more than 70 different large social and information networks Large real-world networks look completely different!
Example of our findings • Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges)
NCP: LiveJournal (N=5M, E=42M) Better and better communities Worse and worse communities Community score Best community has 100 nodes Community size
Explanation: Downward part • Whiskersare responsible for downward slope of NCP plot NCP plot Largest whisker Whiskeris a set of nodes connected to the network by a single edge
Explanation: Upward part • Each new edge inside the community costs more Φ=2/1 = 2 NCP plot Φ=8/3 = 2.6 Φ=64/11 = 5.8 Each node has twice as many children
Suggested Network Structure Denser and denser core of the network Core contains 60% node and 80% edges Whiskers are responsible for good communities Network structure: Core-periphery, jellyfish, octopus
Caveat: Bag of whiskers What if we allow cuts that give disconnected communities? Cut all whiskers and compose communities out of them
Caveat: Bag of whiskers We get better community scores when composing disconnected sets of whiskers Community score Connected communities Bag of whiskers Community size
Comparison to a rewired network Rewired network: random network with same degree distribution
What is a good model? • What is a good model that explains such network structure? • None of the existing models work Flat and Down Flat Down and Flat Geometric Pref. Attachment Small World Pref. attachment
Forest Fire model works • Forest Fire: connections spread like a fire • New node joins the network • Selects a seed node • Connects to some of its neighbors • Continue recursively As community grows it blends into the core of the network
Forest Fire NCP plot rewired network Bag of whiskers
Conclusion and connections • Whiskers: • Largest whisker has ~100 nodes • Independent of network size • Dunbar number: a person can maintain social relationship to 150 people • Bond vs. identity communites • Core: • Newman et al. analyzed 400k node product network • Largest community has 50% nodes • Community was labeled “miscelaneous”
Conclusion and connections • NCP plot is a way to analyze network community structure • Our results agree with previous work on small networks • Butlarge networks are fundamentally different • Large networks have core-periphery structure • Small well isolated communities blend into the core of the networks as they grow