Making Diffusion Work for You

Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphExSymposium, MIT Endicott House, Aug 21, 2014

Thanks! • Ali Pinar • Ben Miller Prakash 2014

Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash 2014

Dynamical Processes over networks are also everywhere! Prakash 2014

Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology ........ Prakash 2014

Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] SI Model CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2014

Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2014

Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2014

Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2014

Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash 2014

Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash 2014

High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash 2014

Research Theme ANALYSIS Understanding POLICY/ ACTION Managing/Utilizing DATA Large real-world networks & processes Prakash 2014

Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash 2014

Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash 2014

In this talk Q1: How to ‘zoom-out’ of graphs? Q2: How to control out-breaks? POLICY/ ACTION Utilizing Prakash 2014

In this talk Q3: How does ‘activity’ evolve over time? DATA Large real-world networks & processes Prakash 2014

Outline • Motivation • Part 1: Policy and Action (Algorithms) • Part 2: Learning Models (Empirical Studies) • Conclusion Prakash 2014

Part 1: Algorithms • Q1: How to zoom-out of a network? • Q2: How to control out-breaks? (Broad theme: Network Topology Manipulation) Prakash 2014

“Zoom-out” of the network • “Zoom-out” of the cascade graph to get a quick picture (= summarization) A D D A Zoom-out C C B B F E F E Smaller representation of the network Big graph Coarsening [Purohit, Prakash, et, al. SIGKDD 2014] Prakash 2014

Challenges • C1: How do we maintain diffusive characteristics when coarsening networks? • C2: How do we merge node to get the coarse network? • C3: how do we find the best node to merge fast? Prakash 2014

C1: Modeling diffusion • Information spreads over networks • e.g.:, rumor/meme spreads over Twitter following network • Independent cascade model (IC) [Kempe+, KDD03] • Weights pij: propagation prob. from i to j • Each node has only one chance to infect its neighbors Meme spreading Prakash 2014

Diffusive characteristics • First eigenvalue λ1(of adjacency matrix) is sufficient for most diffusion models. [Prakash et al. ICDM’12 selected for best papers] λ1 is the epidemic threshold (will there be an epidemic?) “Safe” “Vulnerable” “Deadly” Increasing λ1 , Increasing vulnerability Prakash 2014

C1: maintain diffusive characteristics • Goal: maintain the diffusive characteristics of the original network in the coarsened network Make the coarsened network have the least change in the first eigenvalue A D D A coarsen C C B B F E F E Coarsened network Original network Prakash 2014

C2: How to merge nodes • Goal: Merge nodes of graph G to get the coarsened graph that “approximates” G with respect to diffusion Original network • Merge b and a can get the least change of λ1 0.375! Is this correct? Influence from d to b: 0.5 Influence from d to a: 0.25 Average: 0.375 Prakash 2014

C2: How to merge nodes • In general: Merging a,b Prakash 2014

Problem Definition Graph Coarsening Problem (GCP) • Given:a large graph G, and the reduction factor • Find:the best set of adjacent nodes • To minimize |λG-λH| where H is the coarsened graph • i.e.: H has the least change in the first eigenvalue • we use to λG represent the first eigenvalue of G Prakash 2014

C3: which nodes to merge • Goal: • Find the best nodes to merge • Fast, scalable to large network A D D A coarsen C C B B F E F E Coarsened network Original network Prakash 2014

Naive Greedy Heuristic Step: • Score every edge by the change in eigenvalue • Greedily choose the edge (a,b) with the least score, and merge (a,b) • Re-evaluate the scores of every edge and repeat • Too slow! O(m2) time to score all edges • Lose time benefits of analyzing the smaller graph Prakash 2014

CoarseNet: idea • Can we approximate the edge scores faster? • Yes! • Use matrix perturbation arguments to estimate (up to first order terms) the score of an edge in constant time (skipping details) • Score all edges in O(m)time • Naive Heuristic: O(m2) time Prakash 2014

CoarseNet: Complete algorithm • Step 1: compute scores for all edge pairs 2: Merge nodes with smallest score 3. Goto step 1 until αn nodes left Assigning scores Merging edges Original Network (weight=0.5) Coarsened Network Prakash 2014

How do we perform? DBLP Amazon Higher is better The first eigenvalue gets preserved well up to large coarsening factors! (See more results in the paper) Prakash 2014

Application 1: Influence Maximization • Methodology: Step 1: Coarsen the large social network using CoarsenNet Step 2: Solve influence maximization on the coarsened network Step 3: Randomly select one node from each selected “supernode” D A Step 2: Solve influence maximization Step 1: Coarsen C B D F A E C B F E Step 3: Randomly select one node from C We call it CSPIN Prakash 2014

Quality of CSPIN w.r.t We can merge up to 95% of the vertices are merged without significantly affecting the influence spread! Prakash 2014

Application 2: Diffusion Characterization • Goal: use Graph Coarsening to understand information cascades • Dataset: Flixster • a fridendship network with movie ratings • Cascade: the same movie rating from friends • Methodology • coarsen the network using CoarseNet with the reduction factor α=0.5 • study the formed groups (supernodes) • Can get non-network surrogates Prakash 2014

Diffusion observation • Stats: • 1891 groups • mean group size: 16.6 • the largest group: 22061 nodes (roughly 40% of nodes) (See more results in the paper) Observation 1: a very large fraction of movies propagate in a small number of groups Observation 2: a multi-modal distribution Prakash 2014

Future work… • How is it related to community structure? • More applications, like Visualization… • Parallelization Prakash 2014

Part 1: Algorithms • Q1: How to zoom-out of a network? • Q2: How to control out-breaks? Prakash 2014

Immunization (= Interventions) • Different Flavors: • Pre-emptive • Data-aware Prakash 2014

Pre-emptive: Vulnerability (Again!) • First eigenvalue λ1(of adjacency matrix) is sufficient for most diffusion models. [Prakash et al. ICDM’12 selected for best papers] λ1 is the epidemic threshold “Safe” “Vulnerable” “Deadly” Increasing λ1 , Increasing vulnerability Prakash 2014

Goal • Decrease λ1as much as possible • Node based [Tong, P., + ICDM 2010] • Edge-based [Tong, P., Eliassi-Rad+ CIKM 2012, Best Paper Award] • Edge-Manipulation [P., Adamic+ SDM 2013] Prakash 2014

Latest results • First (provable) approximation algorithms for edge-based problem (under submission [Saha, Adiga, P., Vullikanti2014]) • O(log^2 n)--factor (can be improved to O(log n)) • Based on the idea of removing closed walks • Semi-Definite Programming Rounding-based O(1) factor Prakash 2014

Data-aware Immunization [Zhang and Prakash, SDM 2014] Given: Graph and Infected nodes Find: ‘best’ nodes for immunization • Complexity • NP-hard • Hard to approximate within an absolute error • DAVA-tree • Optimal solution on the tree • DAVA and DAVA-fast • Merging infected nodes • Build a “dominator tree”, and run DAVA-tree • Running time: subquadratic • DAVA: O(k(|E|+ |V|log|V|)) • DAVA-fast: O(|E|+|V|log|V|) Graph with infected nodes Dominator tree Prakash 2014

Extensions • Can be extended to Uncertain and noisy initial data as well! [Zhang and Prakash, CIKM 2014] Twitter Firehose API 1% sample Prakash 2014

Outline • Motivation • Part 1: Policy and Action (Algorithms) • Part 2: Learning Models (Empirical Studies) • Conclusion Prakash 2014

Part 2: Empirical Studies • Q3: How does activity evolve over time? Prakash 2014

Google Search Volume e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date (1) First spike (2) Release date (3) Two weeks before release ? ? Prakash 2014

Patterns Y X Prakash 2014

Patterns Y More Data X Prakash 2014

Patterns Y Anomaly ? X Prakash 2014

Making Diffusion Work for You

Making Diffusion Work for You

Presentation Transcript

Making HACCP Work for You

Making Science Work For You

Making Science Work For You…

Making Budget Work for You

Making Clickers Work for You

Making Grants Work for You

Making Grants Work for You

Making CWLA Work for You

Making digital work for you

Making Conservation Work for You

Making technology work for you

Making Knowledge Work for You

Making RSS Work for YOU

Making Publisher Work for You

Making Technology Work for You

Making insurance work for you

Making Budget Work for You

Making seo work for you

Making Grants Work for You

Making Lexiles Work for You

Making Budget Work for You