810 likes | 1.03k Views
Understanding and Managing Cascades on Large Graphs. B. Aditya Prakash Computer Science Virginia Tech. . CS Seminar 11/30/2012. Networks are everywhere!. Facebook Network [2010]. Gene Regulatory Network [ Decourty 2008]. Human Disease Network [ Barabasi 2007]. The Internet [2005].
E N D
Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012
Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash 2012
Dynamical Processes over networks are also everywhere! Prakash 2012
Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • Localized effects: riots…
Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2012
Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2012
Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2012
Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2012
Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash 2012
Why do we care? (4: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash 2012
High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash 2012
Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes Prakash 2012
Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash 2012
Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash 2012
In this talk Q1: How to immunize and control out-breaks better? Q2: How to find culprits of epidemics? POLICY/ ACTION Managing Prakash 2012
In this lecture Q3: How do cascades look like? Q4: How does activity evolve over time? DATA Large real-world networks & processes Prakash 2012
Outline • Motivation • Part 1: Policy and Action (Algorithms) • Part 2: Learning Models (Empirical Studies) • Conclusion Prakash 2012
Part 1: Algorithms • Q1: Whom to immunize? • Q2: How to detect culprits? Prakash 2012
Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, MichalisFaloutsos, Christos Faloutsos “Gelling, and Melting, Large Graphs by Edge Manipulation” in ACM CIKM 2012 (Best Paper Award) [Thanks to Hanghang Tong for some slides!] Prakash 2012
Sick Healthy An Example: Flu/Virus Propagation Contact 1: Sneeze to neighbors 2: Some neighbors Sick 3: Try to recover Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point existing work - Q2: Minimize the propagation - Q3: Maximize the propagation This paper 20
Vulnerability measure λ[ICDM 2011, PKDD2010] λ is the epidemic threshold “Safe” “Vulnerable” “Deadly” Increasing λ Increasing vulnerability Prakash 2012
Minimizing Propagation: Edge Deletion • Given: a graph A, virus prop model and budget k; • Find: delete k ‘best’ edges from A to minimize λ Bad Good
Q: How to find k best edges to delete efficiently? Right eigen-score of target Left eigen-score of source
Minimizing Propagation: Evaluations Log (Infected Ratio) (better) Our Method Time Ticks Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)
Discussions: Node Deletion vs. Edge Deletion • Observations: • Node or Edge Deletion λ Decrease • Nodes on A = Edges on its line graph L(A) Original Graph A Line Graph L(A) • Questions? • Edge Deletion on A = Node Deletion on L(A)? • Which strategy is better (when both feasible)?
Discussions: Node Deletion vs. Edge Deletion • Q: Is Edge Deletion on A = Node Deletion on L(A)? • A: Yes! • But, Node Deletion itself is not easy: Theorem: Line Graph Spectrum. Eigenvalue of A Eigenvalue of L(A) Theorem: Hardness of Node Deletion. Find Optimal k-node Immunization is NP-Hard 26
Discussions: Node Deletion vs. Edge Deletion • Q: Which strategy is better (when both feasible)? • A: Edge Deletion > Node Deletion (better) Green: Node Deletion (e.g., shutdown a twitter account) Red: Edge Deletion (e.g., un-friend two users) 27
Maximizing Propagation: Edge Addition • Given: a graph A, virus prop model and budget k; • Find: add k ‘best’ new edges into A. • By 1st order perturbation, we have λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je) • So, we are done need O(n2-m) complexity Right eigen-score of target Left eigen-score of source Low Gv High Gv 28
Maximizing Propagation: Edge Addition λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je) • Q: How to Find k new edges w/ highest Gv(S) ? • A: Modified Fagin’s algorithm #2: Sorting Targets by v k k+d #3: Search space Search space k k+d #1: Sorting Sources by u Time Complexity: O(m+nt+kt2), t = max(k,d) :existing edge
Maximizing Propagation: Evaluation Log (Infected Ratio) Our Method (better) Time Ticks 30
Fractional Immunization of Networks B. Aditya Prakash, LadaAdamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos Under Submission Prakash 2012
Previously: Full Static Immunization Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k = 2 ? ? Prakash 2012
Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012
Now: Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012
Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012
Fractional Asymmetric Immunization Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Prakash 2012
Fractional Asymmetric Immunization = f Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Prakash 2012
Fractional Asymmetric Immunization Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Another Hospital Hospital Prakash 2012
Our Algorithm “SMART-ALLOC” ~6x fewer! [US-MEDICARE NETWORK 2005] • Each circle is a hospital, ~3000 hospitals • More than 30,000 patients transferred CURRENT PRACTICE SMART-ALLOC Prakash 2012
Running Time Wall-Clock Time > 1 week ≈ > 30,000x speed-up! Lower is better 14 secs Simulations SMART-ALLOC Prakash 2012
Lower is better Experiments SECOND-LIFE PENN-NETWORK ~5 x ~2.5 x K = 200 K = 2000 Prakash 2012
Part 1: Algorithms • Q2: Whom to immunize? • Q3: How to detect culprits? Prakash 2012
B. Aditya Prakash, JillesVreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ in ICDM 2012, Brussels Prakash and Faloutsos 2012
Culprits: Problem definition 2-d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012
Culprits: Problem definition 2-d grid ‘+’ -> infected Who started it? Prior work: [Lappas et al. 2010, Shah et al. 2011] Prakash and Faloutsos 2012
Culprits: Exoneration Prakash and Faloutsos 2012
Culprits: Exoneration Prakash and Faloutsos 2012
Who are the culprits • Two-part solution • use MDL for number of seeds • for a given number: • exoneration = centrality + penalty • Running time = • linear! (in edges and nodes) Prakash and Faloutsos 2012
Modeling using MDL • Minimum Description Length Principle == Induction by compression • Related to Bayesian approaches • MDL = Model + Data • Model • Scoring the seed-set Number of possible |S|-sized sets En-coding integer |S|
Modeling using MDL • Data: Propagation Ripples Infected Snapshot Original Graph Ripple R1 Ripple R2