Dynamics of networks

Jure Leskovec Computer Science Department Cornell University / Stanford University Dynamics of networks

Today: Rich data • Large on-line systems have detailed records of human activity • On-line communities: • Facebook (64 million users, billion dollar business) • MySpace (300 million users) • Communication: • Instant Messenger (~1 billion users) • News and Social media: • Blogging (250 million blogs world-wide, presidential candidates run blogs) • On-line worlds: • World of Warcraft (internal economy 1 billion USD) • Second Life (GDP of 700 million USD in ‘07)

Rich data as networks c) Social networks b) Internet (AS) a) World wide web d) Communication e) Citations f) Protein interactions

Networks: What do we know? • We know lots about the network structure: • Properties: Scale free [Barabasi ’99], 6-degrees of separation [Milgram ’67], Small-world [Watts-Strogatz ’98],Bipartite cores [Kumar et al. ’99], Network motifs [Milo et al. ‘02], Communities [Newman ‘99], Hubs and authorities [Page et al. ’98, Kleinberg ‘99] • Models: Preferential attachment [Barabasi ’99], Small-world [Watts-Strogatz ‘98], Copying model [Kleinberg el al. ’99], Heuristically optimized tradeoffs [Fabrikant et al. ‘02],Latent Space Models [Raftery et al. ‘02], Searchability [Kleinberg ‘00],Bowtie [Broder et al. ‘00], Exponential Random Graphs [Frank-Strauss ‘86], Transit-stub [Zegura ‘97], Jellyfish [Tauro et al. ‘01] We know much less about processes and dynamics of networks

This talk: Network dynamics • Network Dynamics: • Network evolution • How network structure changes as the network grows and evolves? • Diffusion and cascading behavior • How do rumors and diseases spread over networks?

This talk: Scale matters • We need massive network data for the patterns to emerge: • MSN Messenger network[WWW ’08](the largest social network ever analyzed) • 240M people, 255B messages, 4.5 TB data • Product recommendations [EC ‘06] • 4M people, 16M recommendations • Blogosphere[work in progress] • 60M posts, 120M links

This talk: The structure

Diffusion and Cascades • Behavior that cascades from node to node like an epidemic • News, opinions, rumors • Word-of-mouth in marketing • Infectious diseases • As activations spread through the network they leave a trace – a cascade Cascade (propagation graph) Network

Network diffusion • Network diffusion has been extensively studied: • Human behavior [Granovetter ‘78] • Diseases and epidemics [Bailey ‘75] • Innovations [Rogers ‘95] • On the web [Gruhl et al. ‘04] • Organizations [Burt ’04, Aral-Brynjolfsson-van Alstyne ‘07] • For marketing purposes [Richardson-Domingos ‘02, Hill-Provost-Volinsky ‘06] • Trading behaviors [Hirshleifer et al. ‘94] • Decision making [Bikhchandani ‘98, Surowiecky ‘05] We know much less about individual cascading events that lead to diffusion

[w/ Adamic-Huberman, EC ’06] 10% credit 10% off Setting 1: Viral marketing • People send and receive product recommendations, purchase products • Data:Large online retailer:4 million people, 16 million recommendations, 500k products

[w/ Glance-Hurst et al., SDM ’07] Setting 2: Blogosphere • Bloggers write posts and refer (link) to other posts and the information propagates • Data:10.5 million posts, 16 million links

[w/ Kleinberg-Singh, PAKDD ’06] Q1) What do cascades look like? • Are they stars? Chains? Trees? • Information cascades (blogosphere): propagation • Viral marketing (DVD recommendations): (ordered by frequency) • Viral marketing cascades are more social: • Collisions (no summarizers) • Richer non-tree structures

Q2) Human adoption curves • Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Shelling ’78] • What is the shape? • Distinction has consequences for models and algorithms To find the answer we need lots of data Prob. of adoption Prob. of adoption k = number of friends adopting k = number of friends adopting Diminishing returns? Critical mass?

[w/ Adamic-Huberman, EC ’06] Q2) Adoption curve: Validation DVD recommendations (8.2 million observations) Probability of purchasing # recommendations received Adoption curve follows the diminishing returns. Can we exploit this? Later similar findings were made for group membership [Backstrom-Huttenlocher-Kleinberg ‘06], and probability of communication [Kossinets-Watts ’06]

My research: The structure  

Cascade & outbreak detection • Blogs – information epidemics • Which are the influential/infectious blogs? • Viral marketing • Who are the trendsetters? • Influential people? • Disease spreading • Where to place monitoring stations to detect epidemics?

[w/ Krause-Guestrin et al., KDD ’07] The problem: Detecting cascades How to quickly detect epidemics as they spread? c1 c3 c2

[w/ Krause-Guestrin et al., KDD ’07] Two parts to the problem • Cost: • Cost of monitoring is node dependent • Reward: • Minimize the number of affected nodes: • If A are the monitored nodes, let R(A) denote the number of nodes we save We also consider other rewards: • Minimize time to detection • Maximize number of detected outbreaks A R(A) ( )

Optimization problem • Given: • Graph G(V,E), budget M • Data on how cascades C1, …, Ci,…,CKspread over time • Select a set of nodes A maximizing the reward subject to cost(A) ≤ M • Solving the problem exactly is NP-hard • Max-cover [Khuller et al. ’99] Reward for detecting cascade i

[w/ Krause-Guestrin et al., KDD ’07] Detection: Solution outline Problem structure • Submodularity of the reward functions • (think of it as “concavity”) CELF • algorithm with approximate guarantee Speed up • Lazy evaluation

[w/ Krause-Guestrin et al., KDD ’07] Problem structure: Submodularity New monitored node: • Gain of adding a node to small set is larger than gain of adding a node to large set • Submodularity: diminishing returns, think of it as “concavity”) S1 S1 S’ S’ S2 S3 Adding S’helps a lot Adding S’helps very little S2 S4 Placement A={S1, S2} Placement B={S1, S2, S3, S4}

Problem structure: Submodularity • We must show R is submodular: A B • Natural example: • Sets A1, A2,…, An • R(A) = size of union ofAi (size of covered area) • If R1,…,RKare submodular, then ∑pi Ri is submodular R(A  {u}) – R(A) ≥ R(B  {u}) – R(B) B Gain of adding a node to a small set Gain of adding a node to a large set A u

[w/ Krause-Guestrin et al., KDD ’07] Reward function is submodular • Theorem: • Reward function is submodular • Consider cascade i: • Ri(uk) = set of nodes saved from uk • Ri(A) = size of union Ri(uk), ukA • Ri is submodular • Global optimization: • R(A) =  Prob(i) Ri(A)  R is submodular Ri(u2) u2 Cascade i u1 Ri(u1)

[w/ Krause-Guestrin et al., KDD ’07] Solution: CELF Algorithm • We develop CELFalgorithm: • Two independent runs of a modified greedy • Solution set A’: ignore cost, greedily optimize reward • Solution set A’’: greedily optimize reward/cost ratio • Pick best of the two: arg max(R(A’), R(A’’)) • Theorem: If R is submodular then CELF is near optimal: • CELF achieves ½(1-1/e) factor approximation a d b c a c d b e Current solution: {a, c} Current solution: {a} Current solution: {} Marginal reward

Blogs: Information epidemics • Question: Which blogs should one read to be most up to date? • Idea:Select blogs to cover the blogosphere. • Each dot is a blog • Proximity is based on the number of common cascades

[w/ Krause-Guestrin et al., KDD ’07] Blogs: Information epidemics • Which blogs should one read to catch big stories? For more info see our website: www.blogcascade.org CELF Reward (higher is better) In-links (used by Technorati) Out-links # posts Random Number of selected blogs (sensors)

[w/ Krause et al., J. of Water Resource Planning] Same problem: Water Network • Given: • a real city water distribution network • data on how contaminants spread over time • Place sensors (to save lives) • Problem posed by the US Environmental Protection Agency c1 S S c2

[w/ Ostfeld et al., J. of Water Resource Planning] Water network: Results CELF • Our approach performed best at the Battle of Water Sensor Networks competition Degree Random Population saved (higher is better) Population Flow Number of placed sensors

This talk: The structure   

Background: Network models • Empirical findings on real graphs led to new network models • Such models make assumptions/predictions about other network properties • What about network evolution? Model Explains log prob. Power-law degree distribution Preferential attachment log degree

[w/ Kleinberg-Faloutsos, KDD ’05] Q5) Network evolution Internet • Networks are denser over time • Densification Power Law: a … densification exponent (1 ≤ a ≤ 2) • What is the relation between the number of nodes and the edges over time? • Prior work assumes: constant average degree over time E(t) a=1.2 N(t) Citations E(t) a=1.6 N(t)

[w/ Kleinberg-Faloutsos, KDD ’05] Q5) Network evolution Internet • Prior models and intuition say that the network diameter slowly grows (like log N, log log N) • Diameter shrinks over time • as the network grows the distances between the nodes slowly decrease diameter size of the graph Citations What is individual node behaviors are causing such patterns? diameter time

[w/ Backstrom-Kumar-Tomkins, KDD ’08] Q5) Edge attachment • We directly observe atomic events of network evolution(and not only network snapshots) and so on for millions… We observe evolution at finest scale • Test individual edge attachment • Directly observe events leading to network properties • Compare network models by likelihood (and not by just summary network statistics)

[w/ Backstrom-Kumar-Tomkins, KDD ’08] Setting: Edge-by-edge evolution • Network datasets • Full temporal information from the first edge onwards • LinkedIn (N=7m, E=30m), Flickr (N=600k, E=3m), Delicious (N=200k, E=430k), Answers (N=600k, E=2m) • We study 3 processes that control the evolution • P1) Node arrival: node enters the network • P2) Edge initiation: node wakes up, initiates an edge, goes to sleep • P3) Edge destination: where to attach a new edge • Are edges more likely to attach to high degree nodes? • Are edges more likely to attach to nodes that are close?

[w/ Backstrom-Kumar-Tomkins, KDD ’08] Edge attachment degree bias • Are edges more likely to connect to higher degree nodes? PA Gnp Flickr First direct proof of preferential attachment!

[w/ Backstrom-Kumar-Tomkins, KDD ’08] But, edges also attach locally • Just before the edge (u,w) is placed how many hops is between u and w? w Fraction of triad closing edges u PA v Gnp Flickr Real edges are local. Most of them close triangles!

Q6) Generating realistic graphs • Want to generate realistic networks: • Why synthetic graphs? • Anomaly detection, Simulations, Predictions, Null-model, Sharing privacy sensitive graphs, … • Q:Which network properties do we care about? • A:Don’t commit, let’s match adjacency matrices Given a real network Generate a synthetic network Compare graphs properties, e.g., degree distribution

[w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05] The model: Kronecker graphs Edge probability Edge probability • We prove Kronecker graphs mimic real graphs: • Power-law degree distribution, Densification, Shrinking/stabilizing diameter, Spectral properties pij Kronecker product of graph adjacency matrices (actually, there is also a nice social interpretation of the model) (3x3) (9x9) (27x27) Initiator Given a real graph. How to estimate the initiator G1?

[w/ Faloutsos, ICML ’07] Kronecker graphs: Estimation • Maximum likelihood estimation • Naïve estimation takes O(N!N2): • N!for different node labelings: • Our solution: Metropolis sampling: N! (big) const • N2 for traversing graph adjacency matrix • Our solution: Kronecker product (E << N2):N2E • Do stochastic gradient descent P( | ) Kronecker arg max We estimate the model in O(E)

[w/ Faloutsos, ICML ’07] Estimation: Epinions (N=76k, E=510k) • We search the space of ~101,000,000 permutations • Fitting takes 2 hours • Real and Kronecker are very close Degree distribution Path lengths “Network” values probability # reachable pairs network value node degree rank number of hops

[w/ Dasgupta-Lang-Mahoney, WWW ’08] Kronecker & Network structure • Fitting Epinions we obtained G1= • What does this tell about the network structure? As opposed to: Core-periphery No communities No good cuts 0.5 edges Core 0.9 edges 0.9 which gives a hierarchy 0.9 Periphery0.1 edges 0.1 0.5 edges 0.1

Small vs. Large networks • Small and large networks are fundamentally different Scientific collaborations (N=397, E=914) Collaboration network (N=4,158, E=13,422)

Conclusions and reflections • Why are networks the way they are? • Only recently have basic properties been observed on a large scale • Confirms social science intuitions; calls others into question. • Benefits of working with large data • What patterns do we observe in massive networks? • What microscopic mechanisms cause them? Social network of the whole world?

[w/ Horvitz, WWW ’08, Nature ‘08] Planetary look on a small-world • Small-world experiment [Milgram ‘67] • People send letters from Nebraska to Boston • How many steps does it take? • Messenger social network – largest network analyzed • 240M people, 30B conversations, 4.5TB data MSN Messenger network Milgram’s small world experiment Number of steps between pairs of people (i.e., hops + 1)

Reflections on Diffusion • Predictive modeling of information diffusion • When, where and what information will create a cascade? • Where should one tap the network to get the effect they want? Social Media Marketing • How do news and information spread • New ranking and influence measures • Sentiment analysis from cascade structure • How to introduce incentives?

What’s next?

Dynamics of networks