1 / 53

Online Search of Overlapping Communities

Online Search of Overlapping Communities. Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University. Presenter. Wanyun Cui. Outline . Motivation Model Algorithm Experiments

chet
Download Presentation

Online Search of Overlapping Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Search of Overlapping Communities Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University Presenter. Wanyun Cui

  2. Outline • Motivation • Model • Algorithm • Experiments • Applications

  3. Outline • Motivation • Model • Algorithm • Experiments • Applications

  4. Complex network • Complex network is everywhere. Social Network

  5. Complex network • Complex network is everywhere. Internet

  6. Complex network • Complex network is everywhere. Protein Network

  7. Complex network • Complex network is everywhere. Internet Protein Network Social Network

  8. Community structures • Complex network is everywhere. • Most real life networks have community structures. • The graph can be divided into different groups such that the vertices within each group are closely connected and the vertices between different groups are sparsely connected Internet Protein Network Social Network

  9. Overlapping community structure • Overlapping community: a vertex may belong to multiple communities

  10. Overlapping community structure • Overlapping community: a vertex may belong to multiple communities C1: small boat C2: meaning of bucket C3: big boat C4: table wares

  11. Finding community structures • Two possible ways to find the community structure • OCD: overlapping community detection • OCS: overlapping community search

  12. OCD vs. OCS • OCD: divides the entire network to find communities

  13. OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • Facebook network: over 800 million nodes and 100 billion links

  14. OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • A fixed parameter or criterion is not appropriate for all vertices and queries. • Communities of a student • Communities of Barack Obama

  15. OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • Graphs in real life are always evolving over time. • We cannot afford to run OCD very frequently. • OCD loses its freshness and effectiveness

  16. OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • Usually performed in an offline fashion

  17. OCS: problem definition • OCS: • Given graph G, a query vertex v • Return: all communities that v belong to Given: Return:

  18. OCD vs. OCS • Advantages of OCS: • More efficient • Personalized criterion • Light weight • We just need to find communities within the local neighborhoods of the vertex. • Our OCS solution only needs several millisecondsto find answer

  19. OCD vs. OCS • Advantages of OCS: • More efficient • Personalized criterion • Friendly to dynamic graph

  20. OCD vs. OCS • Advantages of OCS: • More efficient • Personalized criterion • Light weight • A good choice to find communities in an online fashion

  21. Applications of OCS • Friend recommendation on Facebook. • Semantic expansion. • Infectious disease control. • Etc.

  22. Challenges of OCS • Modeling • Complexity and scalability • A community should be dense enough • Overlapping aware • Generality

  23. Challenges of OCS • Modeling • Complexity and scalability • OCS in the worst case may need to enumerate an exponential number of valid communities. • Computational hard • Approximate approach

  24. Outline • Introduction • Model • Algorithm • Experiments • Applications

  25. Model • Community structure awareness • Overlapping awareness • Generality • The inner edges of a community should be dense • Clique as the unit of community A clique of 6 vertices

  26. Model • Community structure awareness • Overlapping awareness • Generality • Two k-cliques are adjacent if they share k-1 vertices • A community is a component in the k-clique graph Original graph Clique graph (k=4)

  27. Model • Community structure awareness • Overlapping awareness • Generality • Weaken the strict constraint on clique density and clique adjacency • quasi-clique • adjacency

  28. Model • Community structure awareness • Overlapping awareness • Generality • Weaken the strict constraint on clique density and clique adjacency • quasi-clique • adjacency It’s ok if a few edges are missing in the clique

  29. Model • Community structure awareness • Overlapping awareness • Generality • Loose the strict constraint of clique and adjacency • quasi-clique • 𝛼 adjacency If two cliques share at least 𝛼 vertices, they are 𝛼 adjacent.

  30. Model • Community structure awareness • Overlapping awareness • Generality • Loose the strict constraint of clique and adjacency • quasi-clique • 𝛼 adjacency Original graph Clique graph (=1)

  31. Given graph G, query vertex v, k, ,and , find all connected quasi-clique components containing v. k=4

  32. Alpha-gamma ocs • Given graph G, query vertex v, k, ,and , find all connected quasi-clique components containing v. k=3

  33. Parameter selection • and k • In general, larger k leads to larger • Has an upper bound and a lower bound corresponding to and k

  34. Outline • Introduction • Model • Algorithm • Experiments • Applications

  35. Algorithm • Exact algorithm • Approximate algorithm

  36. Exact Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob

  37. Exact Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob • Drawback • exponential enumerations

  38. Approximate Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob • Approximate • the new clique contains at least one new vertex

  39. Approximate Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob • Approximate • the new clique contains at least one new vertex

  40. Outline • Introduction • Model • Algorithm • Experiments • Applications

  41. Experiments • Setup • Dataset • Intel Core22.13GHz • 4GB memory • 64 bit windows 7

  42. Experiments • Setup • Dataset

  43. Effectiveness • It successfully unveils multiple research interests • Example • Jiawei Han • K=6 C1: multimedia data mining C2: stream data mining C3: information network Jiawei Han

  44. Effectiveness • Our model is flexible to support different parameters. • Example • Jiawei Han • K=9 Jiawei Han

  45. Effectiveness • For most vertices, OCS model can find non-trivial results.

  46. Performance • OCS is more efficient than OCD. • Competitors: • LA • <Identification of overlapping community structure in complex networks using fuzzy c-means clustering> • OSLOM • <Finding statistically significant communities in networks> • Amortized time • (Total time of OCD)/n

  47. Performance: influence of parameters • For the same k and , a smaller costs more time • For the same k and , a smaller costs more time

  48. Accuracy of approximate algorithm • More than 70% accuracy can be consistently achieved, in some cases almost 90% accuracy can be achieved

  49. Outline • Introduction • Model • Algorithm • Experiments • Applications

  50. Diversity-based Social Network Analysis • What is the distribution of diversity? • Can we find people with really large diversity?

More Related