1 / 35

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County. KDD 2008 Workshop on Web Mining and Web Usage Analysis. Outline. Introduction Community Detection Clustering Approach

viveka
Download Presentation

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County KDD 2008 Workshop on Web Mining and Web Usage Analysis

  2. Outline • Introduction • Community Detection • Clustering Approach • Spectral Approach • Co-Clustering • Simultaneous Clustering • Evaluation • Future Work • Conclusions

  3. Outline • Introduction • Community Detection • Clustering Approach • Spectral Approach • Co-Clustering • Simultaneous Clustering • Evaluation • Future Work • Conclusions

  4. Social Media Describes the online technologies and practices that people use to share opinions, insights, experiences, and perspectives and engage with each other. ~Wikipedia

  5. Social Media Graphs G = (V,E) describing the relationships between different entities (People, Documents, etc.) G’ = <V,T,R> a tri-partite graph that expresses how entities ‘Tag’ some resource 1 2 3 4 Users 1 2 Tags 1 2 3 4 URLs

  6. What is a Community Political Blogs A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it. Twitter Network Facebook Network

  7. Outline • Introduction • Community Detection • Clustering Approach • Spectral Approach • Co-Clustering • Simultaneous Clustering • Evaluation • Future Work • Conclusions

  8. Community DetectionClustering Approach Clustering Approach • Agglomerative/Hierarchical Topological Overlap: Similarity is measured in terms of number of nodes that both i and j link to. (Razvasz et al.)

  9. Community DetectionClustering Approach Clustering Approach • Agglomerative/Hierarchical • Divisive/Partition based Remove edges that have highest edge betweenness centrality (Girvan-Newman Algorithm) Political Books

  10. Community DetectionSpectral Approach Graph Laplacian • The graph can be partitioned using the eigenspectrum of the Laplacian. (Shi and Malik) • The second smallest eigenvector of the graph Laplacian is the Fiedler vector. • The graph can be recursively partitioned using the sign of the values in its Fielder vector. Normalized Cuts Cost of edges deleted to disconnect the graph Total cost of all edges that start from B

  11. Community DetectionCo-Clustering • Spectral graph bipartitioning • Compute graph laplacian using Where is the document by term matrix (Dhillon et al.)

  12. Outline • Introduction • Community Detection • Clustering Approach • Spectral Approach • Co-Clustering • Simultaneous Clustering • Evaluation • Future Work • Conclusions

  13. Social Media Graphs Links Between Nodes and Tags Links Between Nodes Simultaneous Cuts

  14. Communities in Social Media A community in the real world is identified in a graph as a set of nodes that have more links within the set than outside it and share similar tags.

  15. Clustering Tags and Graphs Nodes Tags Tags Tags Nodes Nodes Tags Nodes Fiedler Vector Polarity β= 0 is like co-clustering, β= 1 Equal importance to blog-blog and blog-tag, β>> 1 NCut

  16. Clustering Tags and Graphs Clustering Only Links Clustering Links + Tags β= 0 is like co-clustering, β= 1 Equal importance to blog-blog and blog-tag, β>> 1 NCut

  17. Clustering Tags and Graphs Clustering Only Links Clustering Links + Tags

  18. Outline • Introduction • Community Detection • Clustering Approach • Spectral Approach • Co-Clustering • Simultaneous Clustering • Evaluation • Future Work • Conclusions

  19. Datasets • Citeseer • Agents, AI, DB, HCI, IR, ML • Words used in place of tags • Blog data • derived from the WWE/Buzzmetrics dataset • Tags associated with Blogs derived from del.icio.us • For dimensionality reduction 100 topics derived from blog homepages using LDA (Latent Dirichilet Allocation) • Pairwise similarity computed • RBF Kernel for Citeseer • Cosine for blogs

  20. Citeseer Data Accuracy = 36% Accuracy = 62% Higher accuracy by adding ‘tag’ information

  21. Citeseer Data NCut SimCut SimCut Results in • Higher intra-cluster similarity • Lower inter-cluster similarity

  22. Citeseer Data True NCut SimCut Constrains cuts based on both • Link Structure • Tags

  23. Blog Data NCut SimCut SimCut Results in • Higher intra-cluster similarity • Lower inter-cluster similarity

  24. Blog Data NCut 35 Clusters SimCut Ncut Few, Large clusters with low intra-cluster similarity SimCut Moderate size clusters higher intra-cluster similarity

  25. Effect of Number of Tags, Clusters Citeseer Mutual Information compares clusters to ground truth More tags help, to an extent Lower mutual information if only the graph is used

  26. Effect of Number of Tags, Clusters Blogs Mutual Information compares clusters to content-based clusters (no tags/graph) More tags help, to an extent Lower mutual information if only the graph is used

  27. Outline • Introduction • Community Detection • Clustering Approach • Spectral Approach • Co-Clustering • Simultaneous Clustering • Evaluation • Future Work • Conclusions

  28. Future Work • Evaluating SimCut algorithm on derived feature types like: named entities, sentiments and opinions, links to main stream media. • For a dataset with ground truth, a comparison of graph based, text based and graph+tag based clustering • Evaluating effect of varying β

  29. Outline • Introduction • Community Detection • Clustering Approach • Spectral Approach • Co-Clustering • Simultaneous Clustering • Evaluation • Future Work • Conclusions

  30. Conclusions • Many Social Media sites allow users to tag resources • Incorporating folksonomies in community detection can yield better results • SimCut can be easily implemented and relates to Ncut with two simultaneous objectives • Minimize number of node-node edges being cut • Minimize number of node-tag edges being cut • Detected communities can be associated with meaningful, descriptive tags

  31. Thanks!

  32. http://ebiquity.umbc.edu http://socialmedia.typepad.com

  33. More Tags Only Graph SimCut

  34. Citeseer (Community Size, Similarity)

  35. Blogs (Community Size, Similarity)

More Related