UBLF：An Upper Bound Based Approach to Discover Influential Nodes in Social Networks

IEEE ICDM 2013 UBLF：An Upper Bound Based Approach to Discover Influential Nodes in Social Networks Authors: C. Zhou, P. Zhang, J.Guo, X. Zhu, L. Guo Presenter: Peng Zhang, Chinese Academy of Sciences December 7-10, 2013, Dallas, Texas

Content • Background • Problem Formulation • Related work • Our solution • Experiments • Conclusion

Background • Social networks are popularly used • Viral marketing • Information dissemination • Technology/Idea transfers • Influence propagation • Influence maximization • Community detection • Influence inference • Early warning of public opinion • Link Prediction/Friends Recommendation • Partner Recommendation/Social Cooperation/Team Formation

Problem Formulation • Given a directed social graph G=(V,E), a budgetk, and a stochastic propagation model M, finding k nodes, such that the expected spread of the influence can be maximized [Kemp KDD’03] • Challenges: • How to measure the objective function M(S)? • How to find the optimal solution, i.e., the subset k of the most influential nodes?

Problem Formulation b • How to measure the influence M(S)? • Stochastic propagation models • IC model • LT model • Other propagation models: e.g. continuous time IC or LT models • Monte Carlo (MC) simulation • Exact calculation under IC and LT is #P-hard (Chen, KDD’ 10). .3 c .1 .3 .1 .2 .1 e a .3 .4 f .2 .4 .1 .4 .3 h .1 d .2 .1 .2 .4 g I .4 .1 IC propagation model #P-hard

Greedy Algorithm • How to find a subset k containing the most influential nodes • Influence maximization under both IC and LT models isNP-hard . (Kemp, KDD’03) • Property 1: M(S)is monotone: • Property 2: M(S)is submodular: The set cover problem

Greedy Algorithm • Advantage: Performance guarantee of 1− 1/e =63% • Disadvantage: Heavy computation cost • Inner loop： M(S)needs many Monte-Carlo simulations • Outer loop：time complexity of O(Nk), where N is network size

Improvement direction (I): Heuristic algorithms • Heuristic algorithms • ShortestPath: Kimura and Saito (PKDD’06) “Tractable models for information diffusion in social networks” • DegreeDiscount: Chen et al. (KDD'09) “Efficient influence maximization in social networks” • MIA: Chen et al. (KDD'10) “Scalable influence maximization for prevalent viral marketing in large-scale social networks” • DAG: Chen et al. (ICDM’10) “Scalable influence maximization in social networks under the linear threshold model” • SIMPATH： Goyal et al. (ICDM’11)“SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model” e d f g Shortest Path from a to c Node 2’s degree will shrink 2 • Advantage: faster than the Greedy algorithm • Disadvantage: no performance guarantee 5 DegreeDiscount

Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” Greedy algorithm reward d a b b a c e c d e

Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” CELF algorithm Greedy algorithm reward reward d a d a b b b a b a c e c e c c d d e e

e c b d Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” Greedy algorithm CELF algorithm reward reward d a d a b b a b a e c e c c d e

e b Improvement direction (II): Advanced greedy • Advanced greedy algorithms • CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” • Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” Greedy algorithm CELF algorithm reward reward d a d a b b d a b a e • Advantage: by setting up an upper bound, CELF reduces the Monte-Carlo calls and improves the greedy algorithm by up to 700 times • Disadvantage: needs N Monte Carlo simulations to initialize the upper bound, where N is the network size. c e c c d c e

Our work • Motivation • Can we initialize the upper bounds without actually computing the MC simulations ? UBLF algorithm CELF algorithm UBLF algorithm

The upper bound of M(S) Local view Global view How many heads? Proposition 2 establishes a relationship among the activation probabilties in time t and t+1.

The upper bound of M(S) M(S) is bounded by a sum of series. In what condition the series convergent? and what is the limit? Too hard! Its aera? But we know its upper bound!

The upper bound of M(S) Convergent condition：the total influence to or from any node is less than 1. Under condition (14), we get a tractable upper bound. +……=

Our UBLF algorithm • CELF: the first round is time-consuming, needs full MC simulations. • UBLF: the first round is analytical calculated.

Monte-Carlo Simulation Node 1 is selected! (only 1 time MC simulation) Our work: An example for UBLF

Experiments • Data collection • Ca-GrQc • Digger • Ca-HepPh • Email-Enron • Benchmark • CELF • Degree • DegreeDiscount • PageRank • Random Statistics of datas

Experiments • Comparison results (Numbers of MC simulations) Observation: The total MC calls of UBLF is significantly reduced compared to CELF.

Experiments • Comparison results (Influence spread) Observations: The spreads of UBLF and CELF are completely identical, which explains again that UBLF and CELF share the same logic in selecting nodes.

Experiments • Comparison results (Time cost) Observation: UBLF is 2-5 times faster than CELF.

Conclusions Background Problem Formulation Greedy Algorithm Heuristic algorithms: DegreeDiscount, PageRank, et al. Advanced greedy algorithms: CELF, CELF++ UBLF Comparisons

Email: zhangpeng@iie.ac.cn Questions ?

UBLF：An Upper Bound Based Approach to Discover Influential Nodes in Social Networks

UBLF：An Upper Bound Based Approach to Discover Influential Nodes in Social Networks

Presentation Transcript

An Evidence Based Approach to Transfusion

Location-based Social Networks (LBSN)

Is the Cumulative SCI-based EAC an Upper Bound to the Final Cost of Post A-12 Contracts

Location-Based Social Networks

Finding Skyline Nodes in Large Networks

Location-Based Social Networks

Approach to Contain Malinformation in Online Social Networks

Spectrum based Fraud Detection in Social Networks

Detecting Phantom Nodes in Wireless Sensor Networks

An Upper Bound on Locally Recoverable Codes

Kronecker Graphs: An Approach to Modeling Networks

Rethinking routing information in mobile social networks Location-based or social-based

Community-based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks

Approach to Upper GI Bleeding

Approach to Upper Gastrointestinal Bleeding

A Connectivity-Based Popularity Prediction Approach for Social Networks

New Approach for Selfish Nodes Detection in Mobile Ad hoc Networks

An Evidence-Based Approach to Professional In-service Training

Artificial Neural Networks: An Alternative Approach to Risk – Based Design

Approach to Upper GI Bleeding

Securing Wireless Ad Hoc Networks: An ID-Based Cryptographic Approach

An Approach to Flexible QoS Routing Active Networks