210 likes | 497 Views
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University of Michigan. Sub-topics for today. A little step back.. Background and motivation The Algorithm presented
E N D
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University of Michigan Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
Sub-topics for today • A little step back.. • Background and motivation • The Algorithm presented • The good, the bad, the ugly (advantages and drawbacks discussion) • Applications • Summary Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary A little step back... • Edge-betweenness of an edge is the number of shortest paths between pairs of nodes that run along it. 0 4 1 3 2 5 Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary A little step back... • Quality function Q: • The fraction of within-community edges minus the expected value of the same quantity for randomized network (edges fall at random with no regard to community structure) Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary Background and motivation • Community structure in networks is of increasing interest. • Tendency to devide into tightly-knit groups: • Inner edges? Many. • Between-group edges? A lot less. • Enter the Girvan and Newman algorithm. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary The Girvan And Newman Algorithm • The betweenness of all existing edges in the network is calculated. • The edge with the highest betweenness is removed. • The betweenness of all edges affected by the removal is recalculated. • Steps 2 and 3 are repeated until no edges remain. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary The Girvan And Newman Algorithm 1 0 1 1 8 2 1 1 24 3 7 9 1 6 9 1 4 1 1 3 5 1 0 1 2 3 4 5 6 7 8 9 As we move down the tree, we see the partitioning of groups. DENDROGRAM Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary Background and motivation • The G&N algorithm presented runs in worst case O(m^2n), or O(n^3) on a sparse graph. • This limits us to networks with only thousands of nodes. • Skype: 300 million users. • Whatsapp: 450 million users. • Twitter: 243 million active users (monthly). • Facebook: 1.23 billion (!!!) users. • So obviously, we need to find a better solution. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary The Algorithm presented • The quality function “Q” presented earlier indicates whether a division is meaningful. • Why not use it? Optimize Q over all possible divisions and find the best one! • The Problem is that doing this, in a straight-forward manner, will take anexponential amount of time. • A possible solution is a greedy implementation. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary The Algorithm presented • Initially, each of the n nodes is a sole member of its own community. • We join communities together in pairs iteratively. • On each step, we choose the join that gives the largest increase (or smallest decrease) in Q. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary The Algorithm presented ∆Q = eij + eji − 2aiaj = 2(eij − aiaj) • Singleton communities (a=1, b=2, c=3, d=4) • Join (4 choose 2 = 6 options), best 1U2 (a,b=1, c=2, d=3) • Join (3 choose 2 = 3) maximal, best 2U3 (a,b=1, c,d=2) • Further partitioning is negative. A B A B D C D C Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary The Algorithm presented 0 0 1 2 3 4 5 6 7 8 9 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 2 1 1 0 0 1 0 1 0 0 0 3 0 0 0 0 1 1 0 0 0 0 4 0 0 1 1 0 1 0 0 0 0 5 0 0 0 1 1 0 0 0 0 0 6 0 0 1 0 0 0 0 1 0 0 7 0 0 0 0 0 0 1 0 1 1 8 0 0 0 0 0 0 0 1 0 1 9 0 0 0 0 0 0 0 1 1 0 1 8 2 7 6 9 4 3 5 0 1 2 3 4 5 6 7 8 9 As the *algorithm iterates, we get a partition of the graph. DENDROGRAM * Algorithm implementation from: http://www.elemartelot.org/ Erwan Le Martelot Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary The Algorithm presented • Operates on completely different principles than the G&N algorithm. • Agglomerative. • Runs in worst case O((m+n)n) or O(n^2) on sparse graphs. • Completes in a reasonable time on a network with a million vertices. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary Advantages and Drawbacks • Gives generally good divisions. • Typically, when executed is a lot faster then G&N. • THOUSANDS OF TIMES FASTER THEN G&N. • Usually not better then G&N at correctly identifying communities. • Why? Because our algorithm makes desicions based on local information. G&N actively analyzes the entire network. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary Applications • Random graphs of n=128 vertices devided into 4 groups of 32, with varying avg Zin and Zout values for vertices, where Zin+Zout=16. • G&N generally performs better, although usually only by ~1% identification difference. On high Zin, new algorithm wins. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary Applications • Real world networks • Zachary Karate Club. • Similar performance to G&M. • American college Football teams. • G&M wins by points on accuracy. • New algorithm is faster. • Callaboration between physicists. • New algorithm wins by knockout on speed • 42 minutes VS estimated 3-5 years. • Results correlate to human observence. Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
A little step back.. | Background | The Algorithm | Advantages & Drawbacks | Applications | Summary Summary • The new algorithm is faster and pretty accurate, although not as G&N. • Allows us to study much larger systems than previously possible. • For smaller networks G&N. For larger networks new algorithm. • As you’ll see in the next presentation, there is always room for improvement Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)
THANK YOU! Advanced Topics in on-line Social Network Analysis - 2014\Spring Fast algorithm for detecting community structure in networks (M. E. J. Newman)