540 likes | 852 Views
Online Search of Overlapping Communities. Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University. Presenter. Wanyun Cui. Outline . Motivation Model Algorithm Experiments
E N D
Online Search of Overlapping Communities Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University Presenter. Wanyun Cui
Outline • Motivation • Model • Algorithm • Experiments • Applications
Outline • Motivation • Model • Algorithm • Experiments • Applications
Complex network • Complex network is everywhere. Social Network
Complex network • Complex network is everywhere. Internet
Complex network • Complex network is everywhere. Protein Network
Complex network • Complex network is everywhere. Internet Protein Network Social Network
Community structures • Complex network is everywhere. • Most real life networks have community structures. • The graph can be divided into different groups such that the vertices within each group are closely connected and the vertices between different groups are sparsely connected Internet Protein Network Social Network
Overlapping community structure • Overlapping community: a vertex may belong to multiple communities
Overlapping community structure • Overlapping community: a vertex may belong to multiple communities C1: small boat C2: meaning of bucket C3: big boat C4: table wares
Finding community structures • Two possible ways to find the community structure • OCD: overlapping community detection • OCS: overlapping community search
OCD vs. OCS • OCD: divides the entire network to find communities
OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • Facebook network: over 800 million nodes and 100 billion links
OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • A fixed parameter or criterion is not appropriate for all vertices and queries. • Communities of a student • Communities of Barack Obama
OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • Graphs in real life are always evolving over time. • We cannot afford to run OCD very frequently. • OCD loses its freshness and effectiveness
OCD vs. OCS • Disadvantages of OCD • Too costly • Global criterion • Unfriendly to dynamic graph • Usually performed in an offline fashion
OCS: problem definition • OCS: • Given graph G, a query vertex v • Return: all communities that v belong to Given: Return:
OCD vs. OCS • Advantages of OCS: • More efficient • Personalized criterion • Light weight • We just need to find communities within the local neighborhoods of the vertex. • Our OCS solution only needs several millisecondsto find answer
OCD vs. OCS • Advantages of OCS: • More efficient • Personalized criterion • Friendly to dynamic graph
OCD vs. OCS • Advantages of OCS: • More efficient • Personalized criterion • Light weight • A good choice to find communities in an online fashion
Applications of OCS • Friend recommendation on Facebook. • Semantic expansion. • Infectious disease control. • Etc.
Challenges of OCS • Modeling • Complexity and scalability • A community should be dense enough • Overlapping aware • Generality
Challenges of OCS • Modeling • Complexity and scalability • OCS in the worst case may need to enumerate an exponential number of valid communities. • Computational hard • Approximate approach
Outline • Introduction • Model • Algorithm • Experiments • Applications
Model • Community structure awareness • Overlapping awareness • Generality • The inner edges of a community should be dense • Clique as the unit of community A clique of 6 vertices
Model • Community structure awareness • Overlapping awareness • Generality • Two k-cliques are adjacent if they share k-1 vertices • A community is a component in the k-clique graph Original graph Clique graph (k=4)
Model • Community structure awareness • Overlapping awareness • Generality • Weaken the strict constraint on clique density and clique adjacency • quasi-clique • adjacency
Model • Community structure awareness • Overlapping awareness • Generality • Weaken the strict constraint on clique density and clique adjacency • quasi-clique • adjacency It’s ok if a few edges are missing in the clique
Model • Community structure awareness • Overlapping awareness • Generality • Loose the strict constraint of clique and adjacency • quasi-clique • 𝛼 adjacency If two cliques share at least 𝛼 vertices, they are 𝛼 adjacent.
Model • Community structure awareness • Overlapping awareness • Generality • Loose the strict constraint of clique and adjacency • quasi-clique • 𝛼 adjacency Original graph Clique graph (=1)
Given graph G, query vertex v, k, ,and , find all connected quasi-clique components containing v. k=4
Alpha-gamma ocs • Given graph G, query vertex v, k, ,and , find all connected quasi-clique components containing v. k=3
Parameter selection • and k • In general, larger k leads to larger • Has an upper bound and a lower bound corresponding to and k
Outline • Introduction • Model • Algorithm • Experiments • Applications
Algorithm • Exact algorithm • Approximate algorithm
Exact Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob
Exact Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob • Drawback • exponential enumerations
Approximate Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob • Approximate • the new clique contains at least one new vertex
Approximate Algorithm • Example • k=4, (3,1)-OCS • Query vertex = Bob • Approximate • the new clique contains at least one new vertex
Outline • Introduction • Model • Algorithm • Experiments • Applications
Experiments • Setup • Dataset • Intel Core22.13GHz • 4GB memory • 64 bit windows 7
Experiments • Setup • Dataset
Effectiveness • It successfully unveils multiple research interests • Example • Jiawei Han • K=6 C1: multimedia data mining C2: stream data mining C3: information network Jiawei Han
Effectiveness • Our model is flexible to support different parameters. • Example • Jiawei Han • K=9 Jiawei Han
Effectiveness • For most vertices, OCS model can find non-trivial results.
Performance • OCS is more efficient than OCD. • Competitors: • LA • <Identification of overlapping community structure in complex networks using fuzzy c-means clustering> • OSLOM • <Finding statistically significant communities in networks> • Amortized time • (Total time of OCD)/n
Performance: influence of parameters • For the same k and , a smaller costs more time • For the same k and , a smaller costs more time
Accuracy of approximate algorithm • More than 70% accuracy can be consistently achieved, in some cases almost 90% accuracy can be achieved
Outline • Introduction • Model • Algorithm • Experiments • Applications
Diversity-based Social Network Analysis • What is the distribution of diversity? • Can we find people with really large diversity?