1 / 37

Lecture on Reverse Sampling: Models of Influence Diffusion and Greedy Algorithm

This lecture discusses reverse sampling in models of influence diffusion, focusing on the greedy algorithm and maximizing influence spread. It also explores the challenges and computational complexity of the problem.

Download Presentation

Lecture on Reverse Sampling: Models of Influence Diffusion and Greedy Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2-7 Reverse Sampling Ding-Zhu Du University of Texas at Dallas

  2. Outline • Greedy • Reverse Sampling

  3. Models of Influence Diffusion • Two basic classes of probabilistic diffusion models: • thresholdand cascade • General operational view: • A social network is represented as a directed graph, with each person (customer) as a node. • Nodes start either active or inactive. • An active node may trigger activation of neighboring nodes • Monotonicity assumption: active nodes never deactivate.

  4. Influence Maximization Problem • Influence spread of node set S: σ(S) • expected number of active nodes at the end of diffusion process, if set S is the initial active set. • Problem Definition (by Kempe et al., 2003): (Influence Maximization). Given a directed and edge-weighted social graph G = (V,E, p), a diffusion model m, and an integer k ≤ |V |,find a set S ⊆ V , |S| = k, such that the expected influence spread σm(S) is maximum.

  5. Known Results • Bad news: NP-hard optimization problem for both IC and LT models. • Good news: • σm(S) is monotone and submodular. • We can use Greedy algorithm! • Theorem: The resulting set S activates at least (1-1/e) (>63%) of the expected number of nodes that any size-k set could activate .

  6. Disadvantage • Lack of efficiency. • Computing σm(S) is # P-hard under both IC and LT models. • Selecting a new vertex u that provides the largest marginal gain σm(S+u) - σm(S), which can only be approximated by Monte-Carlo simulations (10,000 trials). • Assume a weighted social graph as input. • How to learn influence probabilities from history?

  7. What’s running time? • Let rbe the number of samplings for computing σm(S+u) - σm(S). • It runs k iterations. • Each iteration requires estimating the expected spread of O(n) node sets S+u. • Each estimation of expected spread takes measurements on r graphs, and each measurement needs O(m) time. • Total running time O(kmnr).

  8. Theorem

  9. Comments • Waste time on sampling because every randomly generated graph is used only once for a value of objective function

  10. Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter

  11. Smart Way • Step 1. Randomly generates ƟRR sets. • Step 2. Find k nodes to hit maximum number of RR sets.

  12. Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter

  13. What’s RR Sets?

  14. Lemma 1

  15. Lemma 2

  16. Multiplicative Chernoff bound

  17. Proof of Lemma 2

  18. Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter

  19. Step 2. Max Coverage Given a collection C of subsets of a set E, find a subset S of E, with |S|<k, to maximize the number of subsets in C hit(covered) by S . Subsets in C = RR set S = seed set

  20. Step 2. Max Coverage Given a collection C of subsets of a set E, find a subset S of E, with |S|<k, to maximize the number of subsets in C hit(covered) by S .

  21. Submadular Function Max

  22. Greedy Algorithm

  23. Performance Ratio Theorem (Nemhauser et al. 1978)

  24. Theorem Proof

  25. Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter estimation

  26. Lemma 2

  27. Breath-first search (BFS) • For generation of RR set, a randomized BFS is employed.

  28. Lemma 3

  29. Improvement

  30. References

  31. A New Springer Journal ComputationalSocial Networks Editor-in-Chief: Ding-Zhu Du My T. Thai Welcome to Submit Papers

  32. THANK YOU!

  33. Markov's inequality Proof.

  34. Theorem

  35. Multiplicative Chernoff bound

More Related