300 likes | 560 Views
DAVA: Distributing Vaccines over Networks under Prior Information. Yao Zhang, B . Aditya Prakash Department of Computer Science Virginia Tech. SDM, Philadelphia, April 24, 2014. Motivation: Epidemiology. Virus spreads over contact networks SIR model [Anderson+ 1991]
E N D
DAVA: Distributing Vaccines over Networks under Prior Information Yao Zhang,B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Philadelphia, April 24, 2014
Motivation: Epidemiology • Virus spreads over contact networks • SIR model [Anderson+ 1991] • Susceptible-Infectious-Recovered • Weights pij: propagation prob. from i to j • Recovered prob. δ for each node • (models mumps-like infections) Zhang and Prakash, SDM2014
Motivation: Social Media • Meme/Rumor spreads over friendship networks • E.g.: Twitter following network • Independent cascade model (IC) [Kempe+ KDD2003] • Each node has only one chance to infect its neighbors • Special case of SIR model Zhang and Prakash, SDM2014
Immunization • Centers for Disease Control (CDC) cares about containing epidemic diseases • E.g: ~400 million dollars used for vaccines for children in 2013 • Twitter tries to stop rumor spread • E.g.: rumors of victims after the Boston Marathon bombs in 2013 How to choose best nodes to vaccinate (remove)? Zhang and Prakash, SDM2014
Immunization Pre-emptiveimmunization (choose nodes before the epidemic starts) • Acquaintance strategy [Cohen+ 2003] • pick a random person, immunize one of its neighbors at random • Netshield [Tong+ 2010] • Minimize the epidemic threshold (point when the virus takes-off) Good for baseline strategies Zhang and Prakash, SDM2014
In reality Pre-emptive immunization (choose nodes before the epidemic starts) • Acquaintance strategy [Cohen+ 2003] • Netshield[Tong+ 2010] Typically the epidemic has already started! • More realistic intervention • Which nodes to vaccinate now? • We call it Data-Aware Immunization ? this paper Zhang and Prakash, SDM2014
Outline • Motivation • Problem Definition • Complexity • Our Proposed Methods • Experiments • Conclusion Zhang and Prakash, SDM2014
Data-Aware Vaccination Problem Problem: Given a set of infected nodes anda contact graph, howto distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic? D D Best solution A A E E B B 1 vaccine? F F C C Remove A, save {A, D}; Remove B, save {B}; Remove C, save {C}; pij =1 for all edges Zhang and Prakash, SDM2014
Outline • Motivation • Problem Definition • Complexity • Our Proposed Methods • Experiments • Conclusion Zhang and Prakash, SDM2014
Complexity of DAV See paper for details • NP-hard • Reduce from Maximum K-Intersection Problem (MaxKI: maximizing the intersection of k subsets) • MaxKIis NP-Complete [Vinterbo 2004] • Approximation algorithm? • Not submodular • Actually, DAV ishard to approximate within an absolute error! Zhang and Prakash, SDM2014
Outline • Motivation • Problem Definition • Complexity • Our Proposed Methods • assume IC model and undirected graph • Experiments • Conclusion Zhang and Prakash, SDM2014
1: Simplify - Merging infected nodes • Idea: merge all the infected nodes into a single ‘super infected’ node I Merged Graph Original Graph Super node I A A pA pA Equivalent B pX B pB pY pC Logical-OR pB=1-(1-pX)(1-pY) pC C C Zhang and Prakash, SDM2014
2: DAVA-Tree Algorithm: Idea • Select nodes with the largest “benefit” • : the expected number of saved nodes after removing set S on graph G • Benefit of adding additional node j into S: # of saved nodes after adding j into S Merged Infected Node Additional number of saved nodes when adding node j into S Benefit: 5 Benefit: 4 pij =1for all edges Benefit: 2 Zhang and Prakash, SDM2014
DAVA-Tree Alg.: Optimal on Trees For any set S: • Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I Merged Infected Node • Fact 2: the benefit of each such node is independent of the rest of the set S Benefit: 2 Benefit: 5 pij =1for all edges Benefit: 4 Linear Time DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit Zhang and Prakash, SDM2014
3: General Case – Arbitrary Graphs • Idea • We have the optimal algorithm for a tree • Extract a spanning tree, then run DAVA-tree • What kind of tree? • Minimum spanning tree Optimal on MST by DAVA-tree Optimal solution MST pij =1 for all edges Zhang and Prakash, SDM2014
3: General Case – Arbitrary Graphs • Idea • We have the optimal algorithm for a tree • Build a spanning tree first • What kind of tree? • Minimum spanning tree Software engineering We propose to use dominator tree u dominates v every path from I to v contains u 4 dominates 8,9,10,11 pij =1 for all edges Zhang and Prakash, SDM2014
Dominator Tree u dominates v AND every other dominator of v dominates u uis immediate dominator of v • Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph • Fact 2:(for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution Dominator tree: add an edge between every such u and v Optimal from DAVA-tree Optimal solution Linear time [Buchsbaum, Tarjan 1998] pij =1 for all edges Dominator Tree Merged Graph Zhang and Prakash, SDM2014
Weighting the dominator tree • Weighting the dominator tree • #P-complete • Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm) w1 p1 p3 w3 p6 w6 Dominator Tree Merged Graph Zhang and Prakash, SDM2014
DAVA algorithm Merged Graph (pij=1 for all edges) Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k |S|=2 Iteration=1 Dominator Tree Zhang and Prakash, SDM2014
DAVA algorithm Merged Graph Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Remove selected node O(k(|E|+ |V|log|V|)) Too slow for large networks! Dominator tree |S|=2 Iteration=2 Iteration=1 Zhang and Prakash, SDM2014
DAVA-fast: a faster algorithm Merged Graph Step: 1. T = Build a dominator tree 2. S = Run DAVA-tree on T with budget=k |S|=2 • In practice, the performance of DAVA-fast is very close to DAVA • Time complexity: subquadratic! • DAVA-fast: O(|V|log|V|+|E|) Dominator tree Zhang and Prakash, SDM2014
Extending to SIR model • See the paper Zhang and Prakash, SDM2014
Outline • Motivation • Problem Definition • Complexity • Our Proposed Methods • Experiments • Conclusion Zhang and Prakash, SDM2014
Experiments • Virus Propagation Model • IC and SIR • Settings (See more settings in the paper) • Randomly uniformly chosen initial infected nodes • Baseline Algorithms • RANDOM: randomly uniformly chosen healthy nodes • DEGREE: choose nodes with top weighted degrees • PAGERANK: choose nodes with top pageranks • NETSHIELD • state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010] • Assumes no data is given before the epidemic starts Zhang and Prakash, SDM2014
Experiments: datasets Datasets are chosen from different domains • Social media (IC model) • OREGON: AS router graph • STANFORD: hyperlink network • GNUTELLA: peer-to-peer network • BRIGHTKITE: friendship network • Epidemiology (SIR model) • PORTLANDand MIAMI: large urban social-contact graph used in national smallpox modeling studies [Eubank+, 2004] Zhang and Prakash, SDM2014
Experiments: Quality PORTLAND (SIR model) GNUTELLA (IC model) Higher is better DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA. (See more results in the paper) Zhang and Prakash, SDM2014
Experiments: Scalability did not finish within 10 hours Running time(sec.) Lower is better Zhang and Prakash, SDM2014
Outline • Motivation • Problem Definition • Complexity • Our Proposed Methods • Experiments • Conclusion Zhang and Prakash, SDM2014
Conclusion Graph with infected nodes Data-Aware Vaccination problem Given: Graph and Infected nodes Find: ‘best’ nodes for immunization • Complexity • NP-hard • Hard to approximate within an absolute error • DAVA-tree • Optimal solution on the tree • DAVA and DAVA-fast • Merging infected nodes • Build a dominator tree, and run DAVA-tree • Running time: subquadratic • DAVA: O(k(|E|+ |V|log|V|)) • DAVA-fast: O(|E|+|V|log|V|) Merged graph Dominator tree Zhang and Prakash, SDM2014
Any Questions? Graph with infected nodes Code at: http://people.cs.vt.edu/~yaozhang Merged graph Yao Zhang B. Aditya Prakash Dominator tree Thanks for the support of NSF (Grant No. IIS-1353346). Zhang and Prakash, SDM2014