Discovering Overlapping Groups in Social Media

Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu xufei.wang@asu.edu Arizona State University

Social Media • Facebook • 500 million active users • 50% of users log on to Facebook everyday • Twitter • 100 million users • 300, 000 new users everyday • 55 million tweets everyday • Flickr • 12 million members • 5 billion photos

Activities in Social Media Connect with others to form “Friends” Interactwith others (comment, discussion, messaging) Bookmarkwebsites/URLs (StumbleUpon, Delicious) Joingroupsif explicitly exist (Flickr, YouTube) Writeblogs(Wordpress,Myspace) Updatestatus(Twitter, Facebook) Sharecontent (Flickr, YouTube, Delicious)

Community Structure • Behavior Studying • Individual ? Too many users • Site level ? Lose too much details • Community level. Yes, provide information with vary granularity

Overlapping Communities Neighbors Colleagues Family

Related Work • Disjoint Community Detection • Modularity Maximization • Based on Link Structure, (how to understand ?) • Overlapping Community Detection • Soft Clustering (Clustering is dense) • CFinder (Efficiency and Scalability) • Co-clustering • Disjoint • Understanding groups by words (tags)

Problem Statement u1 t1 u2 t2 u3 t3 u4 t4 u5 Given a User-Tag subscription matrix M, and the number of clusters k, find koverlappingcommunities which consist of both users and tags.

Our Contributions • Extracting overlapping communities that better reflect reality • Clustering on a user-tag graph. Tags are informative in identifying user interests • Understanding groups by looking at tags within each group

Edge-centric View u1 t1 u2 t2 u3 t3 u4 t4 u1 t1 u4 t3 u5 u3 u2 t2 u5 t4 • Cluster edges instead of nodes into disjoint groups • One node can belong to multiple groups • One edge belongs to one group

Edge-centric View In an Edge-centric view

Clustering Edges • We can use any clustering algorithms (e.g., k-means) to group similar edges together • Different similarity schemes

Defining Edge Similarity tq ui tp uj • α is set to 0.5, which suggests the equal importance of user and tag • Define user-user and tag-tag similarity Similarity between two edges e and e’ can be defined, but not limited, by

Independent Learning • Assume users are independent, tags are independent

Normalized Learning Differentiate nodes with varying degrees by normalizing each node with its nodal degree

Correlational Learning u Х t u Х k • Compute user-user and tag-tag cosine similarity in the latent space • Tags are semantically close • Tagscars, automobile, autos,car reviewsare used to describe a blog written by sid0722 on BlogCatalog

Spectral Clustering Perspective • Graph partition can be solved by the Generalized Eigenvalue problem

Spectral Clustering Perspective • U and V are the right and left singular vectors corresponding to the top k largest singular values of user-tag matrix M Plug in L,W,Z, we obtain

Synthetic Data Sets • Synthetic data sets • Number of clusters, users, and tags • Inner-cluster density and Inter-cluster density (1% of total user-tag links) • Normalized mutual Information • Between 0 and 1 • The higher, the better

Synthetic Performance We fix the number of users, tags, and density, but vary the number of clusters

Synthetic Performance We fixed the number of users, tags, and clusters, but vary the inner-cluster density

Social Media Data Sets • BlogCatalog • Tags describing each blog • Category predefined by BlogCatalog for each blog • Delicious • Tags describing each bookmark • Select the top 10 most frequently used tags for each person

Inferring Personal Interests Category information reveals personal interests, view group affiliation as features to infer personal interests via cross-validation

Connectivity Study The correlation between the number of co-occurrence of two users in different affiliations and their connectivity in real networks. The larger the co-occurrence of two users, the more likely they are connected

Understanding Groups via Tag Cloud Tag cloud for Category Health

Understanding Groups via Tag Cloud Tag cloud for Cluster Health

Understanding Groups via Tag Cloud Tag cloud for Cluster Nutrition

Conclusions and Future Work • Overlapping communities on a User-Tag graph • Propose an edge-centric view and define edge similarity • Independent Learning • Normalized Learning • Correlational Learning • Evaluate results in synthetic and real data sets • Many applications: link prediction, Scalability

References I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” in KDD ’01, NY, USA L. Tang and H. Liu, “Scalable learning of collective behavior based on sparse social dimensions,” in CIKM’09, NY, USA. L. Tang and H. Liu, “Community Detection and Mining in Social Media,” Morgan & Claypool Publishers, Synthesis Lectures on Data Mining and Knowledge Discovery, 2010. G. Palla, I. Dernyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature’05, vol.435, no.7043, p.814 K. Yu, S. Yu, and V. Tresp, “Soft clustering on graphs,” in NIPS, p. 05, 2005. U. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E, vol. 69, no. 2, p. 026113, Feb 2004. S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75 – 174, 2010.

Contact the Authors • Xufei Wang • xufei.wang@asu.edu • Arizona State University • Lei Tang • ltang@yahoo-inc.com • Yahoo! Labs

Discovering Overlapping Groups in Social Media

Discovering Overlapping Groups in Social Media

Presentation Transcript

Social Sub-groups

Social Sub-groups

Outline: Social Groups

Overlapping Community Search for Social Networks

Discovering groups {week 11}

Overlapping

Discovering Obscure Media

Interest Groups/Media

Interest Groups/Media

Social Groups

Discovering Hidden Groups in Communication Networks

Discovering groups {week 03}

In Social Media

Social Groups

Social media use by civil pro-independence groups in Catalonia

Managing risk in social media Social media policy

Discovering Real Social Groups in Contemporary Russia

RGS-IBG Research Groups and Social Media

200 Facebook Groups (Updated Social Media Groups List) 2017

Social media Dubai - Social Media Company in Dubai - Social Media Marketing in Dubai - Social Media Agency Dubai - Socia

In you social groups list…

Social Groups