340 likes | 506 Views
CUT: Community Update and Tracking in Dynamic Social Networks. Hao -Shang Ma and Jen-Wei Huang K nowledge and I nformation D iscovery Lab, Dept . of Electrical Engineering, National Cheng Kung University
E N D
CUT: Community Update and Tracking in Dynamic Social Networks Hao-Shang Ma and Jen-Wei Huang Knowledge and Information Discovery Lab, Dept. of Electrical Engineering, National Cheng Kung University The 7th Workshop on Social Network Mining and Analysis (SNA-KDD'13) joint with the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'13)
About Me • Jen-Wei Huang (黃仁暐) • Knowledge and Information Discovery Lab • Dept. of Electrical Engineering, National Cheng Kung University • Email: jwhuang @ mail.ncku.edu.tw • http://kid.ee.ncku.edu.tw KID Lab, National Cheng Kung University
Research • Data Mining and Database • Time Series Mining • Social Network Analysis • Multimedia Information Retrieval • Ubiquitous Computing • Mobile Computing • Cloud Computing • Bioinformatics KID Lab, National Cheng Kung University
Outline • Introduction • CUT Algorithm • Experiments • Conclusions • References KID Lab, National Cheng Kung University
Introduction • Social networking websites allow users to establish their own personal communities or social networks based on relationships of friends. http://www.facebook.com/ http://twitter.com/ KID Lab, National Cheng Kung University
Introduction • Based on the relationships between users, social networks exhibit a community structure. KID Lab, National Cheng Kung University
Introduction • The detection of communities in a network usually puts network nodes into groups in such a way that nodes in the same group are densely connected to one another. • An objective function is chosen to determine the quality of a community. • Modularity [1] is a measure of the quality of a partition in terms of the number of intra-community and inter-community edges. KID Lab, National Cheng Kung University
Introduction • Social networks are always changing with the time. • We want to quickly and efficiently identify the community structures of a network at every timestamp. • Updating the network structureby tracking previously known information instead of recalculating the relationships of all nodes and edges in the networks. KID Lab, National Cheng Kung University
Introduction • In this work, we define the seed of community, which is a collection of 3-cliques where any two of 3-cliques share more than one edge. • By tracking seed of communities, we are able to efficiently update and track the dynamics of communities in a social network. KID Lab, National Cheng Kung University
Example Network and 3-clique KID Lab, National Cheng Kung University
CUT Algorithm • We propose CUT algorithm, standing for Community Update and Tracking algorithm, to update and track seed of communities. • There are two phases in CUT algorithm. • Initial phase, executed only once. • Find seed of communities • Extend seed of communities to communities • Update and Tracking phase • Maintain and update CAB graph KID Lab, National Cheng Kung University
Find Seed of Communities • 1. Find all 3-cliques in a network • 2. Build CBA (Clique Bipartite Adjacent) graph • 3. Determine the seed of communities in a network KID Lab, National Cheng Kung University
Find All 3-cliques • Backtracking algorithm KID Lab, National Cheng Kung University
All 3-cliques in the Network KID Lab, National Cheng Kung University
Clique Adjacent Bipartite Graph KID Lab, National Cheng Kung University
All 3-cliques in CAB KID Lab, National Cheng Kung University
Determine Seed of Community • DFS-like algorithm to find connected component KID Lab, National Cheng Kung University
CAB Graph • The complexity of tracking CAB is lower than that of tracking the original graph • Complexity of building CAB is O(3|C|)=O(|C|) • Complexity of determining the connected component is O(3|C|)=O(|C|) • Easy to combine or split the seeds of community KID Lab, National Cheng Kung University
Extend to Communities • Ignore the sparse nodes whose degree is smaller than 2. • Assign the remain nodes to the closest seed of community • Closest: the seed of community which has the most links to the node KID Lab, National Cheng Kung University
Update and Tracking Phase • Maintain and Update CAB Graph • If there are some changes in the network, do the following cases • Case 1: New nodes & new edges are added • Case 2: Old nodes & edges are removed • Extend to Communities KID Lab, National Cheng Kung University
Case 1: Merge and Join New Node : 20,21 New Edge : (2,8) (5,20), (9,20), (11,21) New 3-cliques: (2,6,8) and (5,9,20) KID Lab, National Cheng Kung University
Case 1: Merge and Join • =(), =() • If any two edges link to different seeds of communities, SiandSj, we merge(Si, Sj) • Else if any edge of Ck links to any Si then we Join(Si, Ck) • Complexity is O(3*| new C |) = O(| new C |) KID Lab, National Cheng Kung University
Case 2: Split and Removal • If there are nodes removed , we find all edges which connect to the removed nodes N10 is removed. Therefore, (4,10),(6,10) (8,10),(10,12) (10,11) are removed. KID Lab, National Cheng Kung University
Node Removed Case - Split • Remove corresponding edges and cliques • Run FindSeedofCommunity algorithm again to update to new seeds of communities • Complexity is O(3|C|+| removed C |) KID Lab, National Cheng Kung University
Joint Case There are new nodes added and edges removed at the same time KID Lab, National Cheng Kung University
Joint Case • We simply deal with the Case 1 first, and then deal with the Case 2 so that we can decrease the unnecessary splits. • Finally, extend seed of communities to communities. KID Lab, National Cheng Kung University
Related Works - Update the Community Structure • Nam P. Nguyen et al. propose a QCA algorithm. [9] • The QCA algorithm uses the already known community structure, and deal with the changing cases, new nodes, new edges, nodes removed, and edges removed based on modularity. • In QCA algorithm, they keep the whole community structure at each timestamp. • Using original CPM in removed case every time, which cost lots of time. • They have to identifying the nodes or edges belong to which type of cases. It costs much time as well. KID Lab, National Cheng Kung University
Experiments • Coauthor network (2002~2010) • 1. About 20000 authors in one network • 2. Densely connected graph • 3. Five years as a time period, t1 is 2002-2006 (first update) • 4. Variations of network at each time stamp are small KID Lab, National Cheng Kung University
Experiments KID Lab, National Cheng Kung University
Experiments • p2p-Gnutella network • 1. t1-t4 is a snapshot from August 4 to 7 2002, about 6000 nodes • 2. Sparse connected graph • 3. Variations of network at each time stamp are large. KID Lab, National Cheng Kung University
Experiments KID Lab, National Cheng Kung University
Conclusions • We design CUT algorithm for updating community structures in dynamic social networks instead of recalculating relationships of all nodes and edges in the social network. • Keeping seeds of communities in the memory at each timestamp is more efficient than keeping all communities. • Using Clique Adjacent Bipartite graph to update and track seeds of community leads to lower complexity. KID Lab, National Cheng Kung University
References • M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phy. Rev. E 69, 2004. • Bowen Yan and Steve Gregory,” Detecting Communities in Networks by Merging Cliques,” ICIS, 2009. • CLAUSET, G., NEWMAN, M. E. and MOORE, C., “Finding community structure in very large networks,” Phys. Rev. E 70, 066111, 2004. • Zhengzhang Chen, Kevin A. Wilson, Ye Jin, William Hendrix and Nagiza F. Samatova, “Detecting and Tracking Community Dynamics in Evolutionary Networks,” ICDMW, 2010. KID Lab, National Cheng Kung University
References • Yi Wang, Bin Wu, and Xin Pei, “CommTracker: A Core-Based Algorithm of Tracking Community Evolution,” ADMA, 2008. • Nam P. Nguyen, Thang N. Dinh, Ying Xuan, and My T. Thai. “Adaptive Algorithms for Detecting Community Structure in Dynamic Social Networks,” INFOCOM, 2011. • Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte and Etienne Lefebvre,”Fastunfolding of communities in large networks,” JSTAT, 2008. • Nan Du, Bin Wu, Xin Pei, BaiWang and LiutongXu,” Community Detection in Large-Scale Social Networks,” SNA-KDD, 2007. KID Lab, National Cheng Kung University