Maximizing the Spread of Influence through a Social Network

Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003

Social network and spread of influence • Social network spreads INFLUENCE among its members • Opinions, ideas, information … • “Word-of-mouth” effect in Viral Marketing

Motivating scenarios • Adoption of a new drug by doctors and patients How to reach many patients? • Adoption of a new book by profs and students How to reach many students? • Bloggers blogging and publishing weblogs Follow which blogger to get the most information? • Battle of Water Sensor Networks How to find the optimize sensor placement?

Problem setting • Given • A limited budget B for initial advertising • Influence estimates between individuals • Goal • Trigger a large cascade of influence • Question • Which set of individuals should we target?

What do we have in this paper? • Form models of influence in social networks • Obtain data about particular network (inter-personal influence estimating) • Devise algorithm to maximize spread of influence

Models of influence • First mathematical models • [Schelling ‘70/’78], [Granovetter ‘78] • Large body of subsequent work • [Rogers ‘95], [Valente ‘95], [Wasserman/Faust ‘94] • Two basic classes of diffusion models: threshold and cascade • General operational view: • A social network is a directed graph, each person (individual) is a node • Nodes start either active or inactive • An active node may trigger activation of neighboring nodes • Monotonicity assumption

Linear threshold model • A node has a random threshold • A node is influenced by each neighbor according to a weight such that : • A node becomes active when at least (weighted) fraction of its neighbors are active:

Example Inactive Node 0.6 Active Node Threshold 0.2 0.2 0.3 Active neighbors X 0.1 0.4 U 0.3 0.5 Stop! 0.2 0.5 w v

Independent cascade model • When node becomes active, it has a single chance of activating each currently inactive neighbor • The activation attempt succeeds with independent probability

Example 0.6 Inactive Node 0.2 0.2 0.3 Active Node Newly active node U X 0.1 0.4 Successful attempt 0.5 0.3 0.2 Unsuccessful attempt 0.5 w v Stop!

Influence maximization problem • Influence of a node set : • Expected number of active nodes at the end, if set is the initial active set • Problem: • Given a parameter (budget), find a -node set to maximize • Constrained optimization problem with as the objective function

Properties of • Non-negative (obviously) • Monotone: • Submodular: • Let be a finite set • A set function is submodulariff

Bad news • For a submodular function , if only takes non-negative values, and is monotone, finding a -element set for which is maximized in an NP-hard optimization problem. • It is NP-hard to determine the optimum for influence maximization for both independent cascade model and linear threshold model.

Good news • We can use Greedy algorithm • Start with an empty set • For iterations: • Add node to that maximizes • How good (bad) is it? • Theorem: The greedy algorithm is a approx. • The resulting set activated at least of the number of nodes that any size- set could activate

Greedy algorithm

Other heuristics to find • High-degree • Picks nodes with highest node degree • Distance centrality • Picks nodes with lowest average distance to other nodes in the network • Random • Randomly pick nodes

Experiment setup • Co-authorship network from physics section of arXiv.org • A node is an author • A link is a co-authored paper ( links) • LT model: The edge has weight • IC model: • The edge has prob.

Experiment result on IC model • Result on LT model is similar • Not sensitive to different algorithms at high

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance Carnegie Mellon University KDD 2007

Original Greedy Inefficient!!! 15,000 nodes takes a few days to complete Complexity Redundant!!!

Submodularity property • Recall: • When adding a vertex v to seed set S, the gain of adding v is larger if S is smaller • Therefore: a large number of nodes to not need to be re-evaluate

CELF algorithm If then discard 700 times faster than the original greedy!!!

Efficient Influence Maximization in Social Networks Wei Chen, Yajun Wang, Siyu Yang Microsoft Research, Tsinghua University KDD 2009

Improved greedy • Construct a graph • Obtain by removing edges not for propagation from with prob. • Use DFS/BFS to find out the set of vertices reachable from in • Also obtain • Remove overlapping elements

Improved greedy 15-34% faster than the original greedy!!!

Mix with CELF • Cons • CELF must consider all vertices to be added in the first round, but then we can decreased in future rounds • Improved greedy must build G’ for R times • Mix • First vertex: use Improved greedy • Other vertices: use CELF

Maximizing the Spread of Influence through a Social Network