260 likes | 411 Views
IEEE ICDM 2013. UBLF:An Upper Bound Based Approach to Discover Influential Nodes in Social Networks. Authors: C. Zhou, P. Zhang, J. Guo, X. Zhu, L. Guo Presenter: Peng Zhang , Chinese Academy of Sciences December 7-10, 2013 , Dallas, Texas. Content. Background Problem Formulation
E N D
IEEE ICDM 2013 UBLF:An Upper Bound Based Approach to Discover Influential Nodes in Social Networks Authors: C. Zhou, P. Zhang, J.Guo, X. Zhu, L. Guo Presenter: Peng Zhang, Chinese Academy of Sciences December 7-10, 2013, Dallas, Texas
Content • Background • Problem Formulation • Related work • Our solution • Experiments • Conclusion
Background • Social networks are popularly used • Viral marketing • Information dissemination • Technology/Idea transfers • Influence propagation • Influence maximization • Community detection • Influence inference • Early warning of public opinion • Link Prediction/Friends Recommendation • Partner Recommendation/Social Cooperation/Team Formation
Problem Formulation • Given a directed social graph G=(V,E), a budgetk, and a stochastic propagation model M, finding k nodes, such that the expected spread of the influence can be maximized [Kemp KDD’03] • Challenges: • How to measure the objective function M(S)? • How to find the optimal solution, i.e., the subset k of the most influential nodes?
Problem Formulation b • How to measure the influence M(S)? • Stochastic propagation models • IC model • LT model • Other propagation models: e.g. continuous time IC or LT models • Monte Carlo (MC) simulation • Exact calculation under IC and LT is #P-hard (Chen, KDD’ 10). .3 c .1 .3 .1 .2 .1 e a .3 .4 f .2 .4 .1 .4 .3 h .1 d .2 .1 .2 .4 g I .4 .1 IC propagation model #P-hard
Greedy Algorithm • How to find a subset k containing the most influential nodes • Influence maximization under both IC and LT models isNP-hard . (Kemp, KDD’03) • Property 1: M(S)is monotone: • Property 2: M(S)is submodular: The set cover problem
Greedy Algorithm • Advantage: Performance guarantee of 1− 1/e =63% • Disadvantage: Heavy computation cost • Inner loop: M(S)needs many Monte-Carlo simulations • Outer loop:time complexity of O(Nk), where N is network size
Improvement direction (I): Heuristic algorithms • Heuristic algorithms • ShortestPath: Kimura and Saito (PKDD’06) “Tractable models for information diffusion in social networks” • DegreeDiscount: Chen et al. (KDD'09) “Efficient influence maximization in social networks” • MIA: Chen et al. (KDD'10) “Scalable influence maximization for prevalent viral marketing in large-scale social networks” • DAG: Chen et al. (ICDM’10) “Scalable influence maximization in social networks under the linear threshold model” • SIMPATH: Goyal et al. (ICDM’11)“SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model” e d f g Shortest Path from a to c Node 2’s degree will shrink 2 • Advantage: faster than the Greedy algorithm • Disadvantage: no performance guarantee 5 DegreeDiscount
Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF: Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” Greedy algorithm reward d a b b a c e c d e
Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF: Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” Greedy algorithm reward d a b b a c e c d e
Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF: Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” CELF algorithm Greedy algorithm reward reward d a d a b b b a b a c e c e c c d d e e
e c b d Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF: Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” Greedy algorithm CELF algorithm reward reward d a d a b b a b a e c e c c d e
e b Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF: Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” Greedy algorithm CELF algorithm reward reward d a d a b b d a b a e • Advantage: by setting up an upper bound, CELF reduces the Monte-Carlo calls and improves the greedy algorithm by up to 700 times • Disadvantage: needs N Monte Carlo simulations to initialize the upper bound, where N is the network size. c e c c d c e
Our work • Motivation • Can we initialize the upper bounds without actually computing the MC simulations ? UBLF algorithm CELF algorithm UBLF algorithm
The upper bound of M(S) Local view Global view How many heads? Proposition 2 establishes a relationship among the activation probabilties in time t and t+1.
The upper bound of M(S) M(S) is bounded by a sum of series. In what condition the series convergent? and what is the limit? Too hard! Its aera? But we know its upper bound!
The upper bound of M(S) Convergent condition:the total influence to or from any node is less than 1. Under condition (14), we get a tractable upper bound. +……=
Our UBLF algorithm • CELF: the first round is time-consuming, needs full MC simulations. • UBLF: the first round is analytical calculated.
Monte-Carlo Simulation Node 1 is selected! (only 1 time MC simulation) Our work: An example for UBLF
Experiments • Data collection • Ca-GrQc • Digger • Ca-HepPh • Email-Enron • Benchmark • CELF • Degree • DegreeDiscount • PageRank • Random Statistics of datas
Experiments • Comparison results (Numbers of MC simulations) Observation: The total MC calls of UBLF is significantly reduced compared to CELF.
Experiments • Comparison results (Influence spread) Observations: The spreads of UBLF and CELF are completely identical, which explains again that UBLF and CELF share the same logic in selecting nodes.
Experiments • Comparison results (Time cost) Observation: UBLF is 2-5 times faster than CELF.
Conclusions Background Problem Formulation Greedy Algorithm Heuristic algorithms: DegreeDiscount, PageRank, et al. Advanced greedy algorithms: CELF, CELF++ UBLF Comparisons
Email: zhangpeng@iie.ac.cn Questions ?