1 / 23

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization. Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences chengsuqi@ict.ac.cn,chengsuqi@gmail.com http://www.nascgroup.org/~ chengsuqi.

ardara
Download Presentation

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences chengsuqi@ict.ac.cn,chengsuqi@gmail.com http://www.nascgroup.org/~chengsuqi Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng

  2. Outline • Background • Preliminaries • Motivation • StaticGreedy algorithm • Experiments

  3. Information Cascade • An action or idea are adopted one by one due to social influence • cascade through social relationships • Main Applications • Word-of-Mouth marketing • Out-break detection • Popularity prediction social network

  4. Word-of-Mouth Marketing • To promote a product by seeding a few users; users adopting the product will recommend it • Advantages: efficient; cost-effective follow-up activated users Company seed users How to select the optimal seed users? free product/ discount influence

  5. Influence Maximization for Viral Marketing • Objective function • Influence spreadI(S) : expected number of activated (influenced/adpoted) nodes • Maximize I(S) • Input: • A social influence graph G=(V, E) • An information cascade model • An integer k, |S| ≤ k • Output: A seed set S

  6. Information Cascade Model • Independent cascade (IC) model • each edge (u, v) has a propagation probability p(u, v) • each newly activated node uindependently activates its out-neighbor v with probability p(u, v) • a discrete time model • Influence spread estimation on IC model • Monte Carlo simulation • Heuristic methods 0.2 0.1 0.1 0.3 0.1 0.5 0.2 0.5 0.1 0.4 0.3 0.4 0.4 0.2 0.1 Social influence graph [Leskovec, 2008]

  7. Difficulties in Influence Maximization Difficulty 1: Influence maximization problem is NP-hard.[kempe, KDD’03] Existing solutions • Heuristics • Degree • Pagerank • Betweennes • efficient • inaccurate • Greedy approximate algorithm[Kempe, KDD’03] • (1-1/e-ε)-approximation • iteratively select nodes with largest marginal influence spread • guaranteed by submodularityand montonicityproperties of influence spread function • accurate • inefficient

  8. Difficulties in Influence Maximization Difficulty 2: To exactly compute influence spread is #P-hard. [Chen, KDD’10] Existing solutions • Monte-Carlo simulation • CELF optimization[Leskovec,KDD’07] • NewGreedy[Chen, KDD’09] • CELF++ optimization[Goyal,WWW’11] • accurate • time-consuming • Heuristic methods • DegreeDiscount[Chen, KDD’09] • CGA[Wang, KDD‘10] • PMIA[Chen,KDD’10] • IRIE[Jung, ICDM’12] • efficient • inaccurate A scalability-accuracy delimma!

  9. Our works • Objective : to propose an influence maximization algorithm to solve the scalability-accuracy dilemma

  10. Preliminaries-1 • Social influence graph: G=(V, E), n=|V|, m=|E| • Influence spread: I(S) • Marginal influence spread: M(v|S)=I(S{v}) - I(S) • Properties of I(S) under independent cascade model • submodularity: I(S{v}) - I(S)  I(T{v}) - I(S) iff vV, S  T  V • monotonicity: I(S{v})  I(S) guarantee • Greedy approximate algorithm • iteratively select nodes withthe largest marginal influence spread • provide 1-1/e-ε approximation Influence spread estimation

  11. Preliminaries-2 • Monte Carlo simulation for influence spread estimation • to approximate true values of influence spread by realizations equivalent

  12. Motivation • In existing greedy algorithms • a risk of unguaranteed submodularity and monotonicity of influence spread function • caused by using different results of Monte Carlo simulation across different influence spread estimation • a very large value of R is required, e.g. R=20000 R: number of Monte Carlo simulations for estimation iteration 2 iteration 1 Submodularity is breaked! snapshot 2 snapshot1 influence graph

  13. StaticGreedy algorithm • Core idea: to always use the same snapshots for influence spread estimation • influence spread function is submodular and monotone • a small value of R is required, e.g. R=100 Part1: Generate R static snapshots Part 2: Greedy selection

  14. Performance analysis: Convergence rate • provide (1-1/e-ε)-approximation with a small value of R seed set size = 50 dR,k log R NetHEPT: a benchmark network uniform independent cascade (UIC) model: p(u, v) = p = 0.01 weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)

  15. Performance analysis: Scalability Running time Minimal R required ≈102 times ≈103 times log Rmin log running time (sec) seed set size seed set size R is significantly reduced Running time is significantly reduced

  16. Performance analysis: Complexity n: number of nodes in social influence graph m: number of edges in social influence graph m’: expected number of edges in a snapshot

  17. Speed up StaticGreedy • A dynamic update strategy • calculates the marginal gain in an efficient incremental manner • at each step t, for each snapshot: M(v)  M(v) - |R(v)R(vt*)|, R(v)  R(v) - R(v)R(vt*) • trades space for time R(v): reachable nodes from v in the snapshot v1 initial v1 v2 M(v1)=4 M(v2)=3 M(v3)=2 M(v4)=1 M(v5)=1 M(v6)=1 M(v7)=2 M(v8)=1 v3 v4 v5 v6 v7 v8 snapshot

  18. Speed up StaticGreedy • A dynamic update strategy • calculates the marginal gain in an efficient incremental manner • at each step t, for each snapshot: M(v)  M(v) - |R(v)R(vt*)|, R(v)  R(v) - R(v)R(vt*) • trades space for time R(v): reachable nodes from v in the snapshot v1 after select v* = v1 X -4 v1 -1 v2 M(v1)=0 M(v2)=2 M(v3)=0 M(v4)=0 M(v5)=1 M(v6)=0 M(v7)=2 M(v8)=1 M(v1)=4 M(v2)=3 M(v3)=2 M(v4)=1 M(v5)=1 M(v6)=1 M(v7)=2 M(v8)=1 X X v3 v4 v5 -2 -1 directly update X -1 v6 v7 v8 snapshot

  19. Experiments: setup • Algorithms: • Our algorithms: StaticGreedyCELF, StaticGreedyDU • Baselines: CELFGreedy, SP1M, PMIA, Degree, DegreeDiscount • Tested datasets • Independent cascade models • uniform independent cascade(UIC) model: p(u, v) = p = 0.01 • weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v) • Metrics: Influence spread, running time

  20. Experiments: influence spread • StaticGreedy achieves better accuracy than other heuristics NetPHY UIC model WIC model DBLP UIC model WIC model

  21. Experiments: running time • StaticGreedy runs >103 times faster than CELFGreedy • StaticGreedy has comparable scalability to state-of-the-art heuristics • StaticGreedyDU always runs faster than StaticGreedyCELF log running time (sec) UIC model WIC model

  22. conclusion • Essential reason of the inefficiency of existing greedy algorithms • a risk of unguaranteed submodularity and monotonicity • caused by different Monte Carlo simulations across different estimations • a very large value of R is required  guaranteed accuracy + inefficiency • StaticGreedy algorithm • guaranteed submodularity and monotonicity • using the same Monte Carlo simulations across different estimations • a small value of R is required  guaranteed accuracy + high scalability • runs >103 times quicker than conventional greedy algorithms • A dynamic update strategy to speed up StaticGreedy • about 10 times faster

  23. Thank you! Q & A

More Related