Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint

Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos

Introduction • Computer viruses are a prevalent threat • Existing defense mechanisms (eg., scanning) focus on local behaviors only • like “curing” a contagious disease in one patient • Global defense strategies require the understanding of global propagation behaviors • like “prevention” of spread of a contagious disease in a population • Epidemiological models can help us do exactly that

Introduction • Why do we care? • Understanding the spread of a virus is the first step in preventing it • How fast do we need to disinfect nodes so that the virus attack dies off? • How long will the virus take to die out?

Problem definition • Question: How does a virus spread across an arbitrary network? • Specifically, we want • a general analytic model for viral propagation • that applies to any network topology • and offers an easy-to-compute “threshold condition”

Infected by neighbor Susceptible Infected Cured internally Framework • The network of computers consists of nodes (computers) and edges (links between nodes) • Each node is in one of two states • Susceptible (in other words, healthy) • Infected • Susceptible-Infected-Susceptible (SIS) model • Cured nodes immediately become susceptible

Prob. δ Prob. β Prob. β Framework (Continued) • Homogeneous birth rate β on all edges between infected and susceptible nodes • Homogeneous death rate δ for infected nodes Healthy N2 N1 X Infected N3

Outline • Introduction • Classical models and their limitations • Modeling viral propagation in arbitrary network topologies • Epidemic threshold and eigenvalues • Experiments • Conclusions

Basic Homogeneous Model • [Kephart-White ’91, ’93] • Homogeneous connectivity <k> • Every node has equal probability of connecting to every other node • Many real networks deviate from this!

Power-law Networks • Many real world networks exhibit power-law characteristics • Probability that a node has k links: P(k) = ck –γ • γ = power law exponent • The Internet:2  γ 3…and still evolving • [Faloutsos+ ’99, Ripeanu+ ’02]

Power-law Networks • Model for Barabási-Albert networks (PL-3) • [Pastor-Satorras & Vespignani, ’01, ’02] • Prediction limited to BA type networks • which only allow power-laws of exponent γ = 3

Power-law Networks • Model for correlated (Markovian) networks • [Boguñá-Satorras ’02] • Additional distribution for neighbor degree correlation: P(k|k’) • Difficult to find/produce P(k|k’) in arbitrary networks • Such correlations have yet to be confirmed in real world networks

Healthy N2 N1 X Infected N3 Topology-independent epidemic model • Takes topological characteristics into account without being limited by them • Discrete time • A node is healthy at time t if it • Was healthy before t and not infected at t

Healthy N2 N1 X Infected N3 Topology-independent epidemic model • Takes topological characteristics into account without being limited by them • Discrete time • A node is healthy at time t if it • Was healthy before t and not infected at t OR • Was infected before t, cured and not re-infected at t

Healthy N2 N1 X Infected N3 Topology-independent epidemic model • Takes topological characteristics into account without being limited by them • Discrete time • A node is healthy at time t if it • Was healthy before t and not infected at t OR • Was infected before t, cured and not re-infected at t OR • Was infected before t, therefore ignored re-infection attempts and was subsequently cured at t

Topology-independent epidemic model • Deterministic time evolution of infection • 1 - pi,t: probability node i is healthy at time t • ζk,t: probability a k-linked node will not receive infections from its neighbors at time t • Assume probability of curing before infection attempts 50% • Solve numerically Equation 1

Simulation evaluation of model (1/2) 1000-node homogeneous network KW model Our model Simulation

Simulation evaluation of model (2/2) • Our model’s predictions consistently equal or outperform predictions made by models designed for specific topologies Real-world 10900-node Oregon network PL-3 model Our model Simulation

τ = 1/ λ1,A where λ1,A is the largest eigenvalue of the adjacency matrix A of the topology Epidemic threshold • The epidemic threshold τ is the value such that • β/δ < τ there is no epidemic • where β = birth rate, and δ = death rate • What is this threshold for an arbitrary graph? • [Theorem 1] λ1,A alone captures the property of the graph!

Epidemic threshold for various networks • Our epidemic threshold condition is accurate and general • Homogeneous networks • λ1,A = <k>; τ = 1/<k> • where <k> = average degree • This is the same result as of Kephart & White ! • Star networks • λ1,A = √d; τ = 1/ √d • where d = the degree of the central node • Infinite power-law networks • λ1,A = ∞; τ = 0 ; this concurs with previous results • Finite power-law networks • τ = 1/ λ1,A

Epidemic threshold • [Theorem 1] The epidemic threshold is given by • τ = 1/ λ1,A • How fast will an infection die out? • [Theorem 2] Below the epidemic threshold, the epidemic dies out exponentially • If β/δ < τ (β = birth rate, δ = death rate) then • any local breakout of infection dies out exponentially fast

β/δ > τ(above threshold) β/δ = τ(close to the threshold) β/δ < τ(below threshold) Epidemic threshold experiments (Star)

Epidemic threshold experiments (Oregon) β/δ > τ(above threshold) β/δ = τ(at the threshold) β/δ < τ(below threshold)

PL-3 Our prediction vs. previous prediction Number of infected nodes PL-3 When we do not subsume previous predictions, our predictions are much more accurate Our Our β/δ β/δ Oregon Star

Contributions • We match our goals √ A general analytic model for viral propagation (Equation 1) √ that applies to any network topology √ and offers an easy-to-compute “threshold condition” (Theorem 1)

Contributions • We created new topology-independent epidemic model • More accurate than previous models • More general than previous models • We derived new epidemic threshold condition • Only requires one parameter (λ1,A) that can be calculated with existing tools • Subsumes previous theories for epidemic threshold condition • When does not subsume, our theory is more accurate

Halting viruses • Immunization strategies must concentrate on nodes that are statistically significant • Statistically significant nodes are not necessarily limited to ones that are highly connected • We are building mathematical models to identify the most significant nodes in power-law models • Other system parameters may also matter

Summary and future work • <…cite the paper?> • Our models will provide a theoretical basis for global defense strategies for • intelligent immunization • mechanisms to guard against distributed denial-of-service (DDOS) attacks • those that propagate via virus code • <Whatever you want to do about this entire bullet> • Phase transition phenomena at epidemic threshold • Additional environmental factors that affect epidemic behavior

Basic homogeneous model - KW • Homogeneous connectivity <k> • Homogeneous birth rate β on all edges between infected and susceptible nodes • Homogeneous death rate δ for infected nodes • Susceptible-Infected-Susceptible (SIS) model • Cured individuals immediately become susceptible • Susceptible-Infected-Removed (SIR) model • Cured individuals are removed from the population

Homogeneous model equations • Deterministic time evolution of infected population ηt • Change = birth term - death term • Equilibrium point of infection η, ρ’= δ/(β<k>) • For homogeneous or Erdös-Rényi (random) networks

Homogeneous model η = 1- ’= 1 -  /(<k>) = 1- 0.1 = 0.9

Power-law networks • Discrepancy between simulation results and homogeneous model predictions

Power-law networks • There exist statistically significant nodes • Node 928 was infected 9473 times • Run #3 hits 928 around time 20 • Both runs #1 and #2 hit 928 early in its run

Models for power-law and correlated networks • Many real world networks exhibit power-law characteristics • P(k) = k -γ--- probability that a node has k links • The Internet:2  γ 3…and still evolving • Model for Barabási-Albert networks (SV) • γ = 3 • Steady state: η = 2e-δ/mβ, m = minimum connectivity • Prediction limited to BA type networks • Model for correlated (Markovian) networks • Additional distribution for neighbor degree correlation: P(k|k’) • Difficult to find/produce P(k|k’) in arbitrary networks • Such correlations have yet to be confirmed in real world networks

Epidemic threshold • The epidemic threshold τ = β/δ (the ratio of birth rate to death rate) below which there is no epidemic • Epidemic threshold for existing models: • Threshold of the basic homogeneous model: 1/<k> • Threshold of SV power-law model: <k>/<k2> • We derive an epidemic threshold condition from our model • τ = 1/ λ1,A • λ1,A: largest eigenvalue of the adjacency matrix A of the topology • Power-law networks have extremely low threshold since connectivity variance is usually high, resulting in large λ1,A

Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint