1 / 27

DEMON A Local-first Discovery Method For Overlapping Communities

DEMON A Local-first Discovery Method For Overlapping Communities. Giulio Rossetti 2,1 ,Michele Coscia 3 , Fosca Giannotti 2 , Dino Pedreschi 2,1 1 Computer Science Dep., University of Pisa, Italy { rossetti,pedre }@ di.unipi.it

amara
Download Presentation

DEMON A Local-first Discovery Method For Overlapping Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DEMONA Local-first Discovery Method For OverlappingCommunities Giulio Rossetti2,1 ,Michele Coscia3, Fosca Giannotti2, Dino Pedreschi2,1 1 Computer Science Dep., University of Pisa, Italy {rossetti,pedre}@di.unipi.it 2 ISTI - CNR KDDLab, Pisa, Italy {fosca.giannotti, giulio.rossetti}@isti.cnr.it 3 Harvard Kennedy School, Cambridge, MA, US michele_coscia@hks.harvard.edu April 23th 2013

  2. Outline • Problem Definition • Whatis a community? • Community Discovery • Communities and complex (social) networks • A matter of perspective • DEMON Algorithm(s) • Properties • Experiments • Extension • Conclusions

  3. Whatis a community? Unfortunatelydoesnotexist a completelyshareddefinition of what a community is. A generalidea isthat a community represent: “A set of entitieswhereeachentityiscloser, in the network sense, to the otherentitieswithin the community than to the entitiesoutsideit.” or “A set of nodestightlyconnectedwithineachotherthan with nodesbelonging to other sets.”

  4. Community Discovery The aim of CD algorithmsis to identifycommunitieshiddenintocomplex network structure Why Community Discovery? • “Cluster” homogeneousnodesrelying on topologicalinformation • (Clustering networkedentities) Major Problems: • Eachalgorithmmodelsdifferentpropertiesof real world communities • Comparison and evaluation of differentmethodologiesisnottrivial • Found an acceptable compromise betweennumber of communities and theirsizes • ContextDependency

  5. Community DiscoveryApproaches Given the complexity of the problem a number of differenttypologies of approacheswhereproposed, analyzing: • Directed\Undirectededges • Weighted\Unweightededges • Top-Down\Bottom-Up partitioning • Multidimensionality • OverlapamongCommunities • HierarchicalCommunities • … DEMON: Undirected,Bottom-Up, Overlapping (with Directed, Weighted, Hierarchicalextensions)

  6. Outline • Problem Definition • Whatis a community? • Community Discovery • Communities and complex (social) networks • A matter of perspective • DEMON Algorithm(s) • Properties • Experiments • Extension • Conclusions

  7. Communities in (Social) Networks • Communities can be seenas the basicbricks of a (social) network • In simple, small, networks itis easy identifythem by lookingat the structure..

  8. …butreal world networks are not “simple” • Wecan’tidentifyeasilydifferentcommunities • Too manynodes and edges

  9. Are theytwodifferentphenomena? No!

  10. A Matter of Perspective The only difference is in the scale Locally, for each node the structure makes sense Globally, we are tangled in complex overlaps Idea: a bottom-up approach!

  11. Outline • Problem Definition • Whatis a community? • Community Discovery • Communities and complex (social) networks • A matter of perspective • DEMON Algorithm(s) • Properties • Experiments • Extension • Conclusions

  12. Reducing the complexity Real Networks are Complex Objects Can wemakethem “simpler”? Ego-Networks (networks buildedupon a focalnode, the "ego”, and the nodes to whomegoisdirectlyconnected to plus the ties, ifany, among the alters)

  13. DEMON Algorithm • For eachnoden: • Extract the Ego Network of n • Removen from the Ego Network • Perform a Label Propagation1 • Insertnin each community found • Update the raw community set C • For eachraw community c in C • Merge with “similar” ones in the set (given a threshold)(i.e. merge iff at most the ε% of the smaller one is not included in the bigger one) 1 UshaN. Raghavan, R ́eka Albert, and SoundarKumara. Near linear time algorithm to detect community structures in large-scale networks. PhysicalReview E

  14. Label Propagation – The idea • Eachnodehas an uniquelabel (i.e. its id) • In the first (setup) iterationeachnode, with probability α, changeitslabel to one of the labels of itsneighbors; • At eachsubsequentiterationeachnodeadoptaslabel the oneshared(at the end of the previousiteration) by the majorityof itsneighbors; • We iterate untillconsensusisreached.

  15. Label Propagation – Discussion • Why Label Propagation? • Quasi-linear algorithm • Share our idea of what a community is • Problem: • Ping-Pong effect(the algorithmisnon-deterministic) • Solution • Multilabelallowed(weneedoverlappingcommunitiesafterall…)

  16. DEMON - Twoniceproperties • Incrementality:Given a graph G, an initialset of communities C and an incremental update ∆G consisting of new nodes and new edgesadded to G, where ∆G contains the entire ego networks of all new nodes and of all the preexistingnodesreached by new links, then DEMON(∆G∪ G,C) = DEMON(∆ G, DEMON(G,C)) • Compositionality:Consideranypartition of a graph G intotwosubgraphs G1, G2 suchthat, for anynode v of G, the entire ego network of v in G isfullycontainedeither in G1 or G2. Then, given an initial set of communities C: DEMON(G1∪ G2,C) = Max(DEMON(G1,C), DEMON(G2,C)) Thosepropertymakes the algorithmhighlyparallelizable: itcan runindependently on differentfragments of the overall network with a relativelysmall combination work

  17. Experiments • Networks (with metadata): • Congress (nodes US politicians, connected if they co-sponsor the same bills) • IMDb(nodes Actors, connected if they play in the same movies) • Amazon (nodes Products, connected if they were purchased together) • Compared Algorithms: • Infomap, non-overlapping state-of-the-art • Rosvall and Bergstrom “Maps of random walks on complex networks reveal community structure”, PNAS, 2008 • HLC, overlapping state-of-the-art • Ahn, Bagrow and Lehmann “Link communities reveal multiscale complexity in networks”, Nature, 2010

  18. Quality Evaluation – Community size • number of communities • average community size Amazon

  19. Quality Evaluation - Label Prediction • MultilabelClassificator(BRL, Binary Relevance Learner) • Community memberships of a node as known attributes, real world labels (qualitative attributes) target to be predicted; Congress IMDb

  20. QualityEvaluation - Community Cohesion • How goodisour community partition in describingreal world knowledgeabout the clusteredentities? • “Similarnodes share more qualitative attributesthan dissimilar nodes” Iff CQ(P)>1 we are groupingtogethersimilarnodes

  21. HDemon – Hierarchical merge WhyHierarchical merge? • Classic DEMON Mergefunctiondidnot scale well • Complexityissue (~O(|C|2)) • Bottleneck for huge networks (suchas social graphs) • Weneed to find the right granularityfor the communities • Extensions needed for Label PropagationAlgorithm: • Weightednetworks • Directednetworks (notyetused)

  22. HDemon – Hierarchical merge HDemon(Graph G) Cc = connectedComponent(G) C = ExtractCommunities(G) while (|C|>Cc) For c in C: N <- N ∪make_node(c) For (n,m) in N: If (n share nodes with m): E <- E ∪ (n,m) C <- ExtractCommunities(new Graph(N,E)) ExtractCommunities(Graph G) Egos <- EgoNetworks(G) for e in Egos: C = C ∪ LabelPropagation(e) return C 1 1 1 1 1 1 1 1 2 1 1 1

  23. Outline • Problem Definition • Whatis a community? • Community Discovery • Communities and complex (social) networks • A matter of perspective • DEMON Algorithm(s) • Properties • Experiments • Extension • Conclusions

  24. Future works – Framework structure HFDemon(Graph G) Cc ← |connectedComponent(G)| C ←ExtractCommunities(G) while (|C|>Cc) For c in C: N←N∪ make_node(c) For (n,m) in N: Ifn share nodes with m E ← E ∪ (n,m) C ←ExtractCommunities(new Graph(N,E)) ExtractCommunities(Graph G) C ←<OverlappingCD(G)> returnC Differentscenariosmayrequirerequires alternative communities “definitions”. • Framework for Bottom-up (and overlapping) CD • Regular vs. Hierarchical FDemon(Graph G) C ← <OverlappingCD(e)> Forall c in C(v) C ← Merge(C,c,merging_function) returnC

  25. Future Works – Social Community Evolution Thesisproposal “Evolution in Social Networks” Idea: • Social networks are notstaticobjects • Nodes, Edges can appear and disappear • The sameinteractioncouldoccur multiple times • Communitieschangesconsequently Major Problems • Size and granularity of the communitiesinfluencehevily evolutive models • Hierarchicalmerging? • Which are the nodes prone to leave\join a communities? • Roleidentification • How “strong” is a community? • Community strengthmeasure • Community life-cycle

  26. Conclusions • DEMON approaches the community discovery problem trough the analysis of simple network sub-structures (ego-networks) • Overlapping and Hierarchical algorithms are guided by a social perspective • DEMON outperforms state-of-the-art methodologies • Possible parallel implementation: high scalability

  27. Thanks! Questions? Code available @ http://kdd.isti.cnr.it/~giulio/demon/ (extensions coming soon!)

More Related