200 likes | 362 Views
Modeling Information Diffusion in Networks with Unobserved Links. Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan. Networks with unobserved links. Links help to model how information diffuses from one node to another
E N D
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan
Networks with unobserved links • Links help to model how information diffuses from one node to another • Real-world agents/nodes have connections unobserved by third parties
Problem Overview Given: a network (with missing links) and snapshots of the network states over time. Objective:model information diffusions on networks We examine two different approaches: • Learning the underlying network, upon which a diffusion model is built (similar to some previous work’s approach) • Building a flexible model without learning the missing links
Problem Overview (cont.) Formalism • A node/agent is in state st= 1 if infected, and -1 otherwise, at timet (infection persists) • A diffusion instance/trace srecords snapshots of the network’s states over time • Underlying network G* • Input network G (G* with missing edges) • Niis the neighborhood of iin G (including i itself) Underlying diffusion process: cascade • The probability of infection is proportional with the number of infected neighbors • The model’s parameters determine: (a) the diffusion rate and (b) the spontaneous infection rate.
Problem Summary Capturing diffusion dynamics: log likelihood of diffusion traces L(s) Objective function 1. Structure learning approach Learn network G’ Learn parameters for a cascade model built on G’ 1. Network G 2. A set of diffusion traces s. (training) Evaluation on testing sets of diffusion traces 2. Graphical model approach Learn parameters for a graphical multiagent model built on G
Approach 1: Learning Missing Links MaxInf algorithm (maxC) • Assumption: nodes can be infected by multiple neighbors, as in the cascade model • Objective function: likelihood of traces L(s) • Outline: • greedily adding edges • learning model parameters after each addition that increase the objective function the most • Repeat until the objective function starts to decrease Related work: NetInf [Gomez-Rodriguez et al. ’10]. • Adopted version NetInf’ (netC)
Approach 2: History-Dependent Graphical Multiagent Model • hGMM [Duong, Wellman, Singh, and Vorobeychik AAMAS’10] • Directed edges from node Nidto i: how neighbors’ past states affect i’s present state. • Undirected edges define Niu: correlations/interdependencies among nodes the same time t. (*) Cascade and many others assume conditional independence given history (Niucontainsiitselfonly) (**) For simplicity, we assume Ni = Nid= Niu
Approach 2: hGMM (cont.) Each neighborhood is associated with a potential function πithat represents the unnormalizedlikelihood of the joint statessNi • potential of neighborhood’s joint states at t Joint probability distribution of system’s states at time t neighborhood-relevant abstracted history abstracted history
Approach 2: hGMM (cont.) • hGMMs allow reasoning about state correlations between neighbors who appear disconnected in the input graphical structure • Example: hGMMs could use the potential function of node 2 to express correlations between nodes 1 and 3 to compensate for the missing edge (1, 3). 4 4 4 2 2 2 1 1 1 3 3 3
Approach 2: hGMM (cont.) A. Tabular hGMM(taG): potential πi of each neighborhood is a function of 5 features: • number of agents infected at t-1, • number of agents becoming infected at t, • neighborhood size, • i’s state at t (present) • i’s state at t-1 (past)
Approach 2: hGMM (cont.) B. ParametrichGMM(paG): based on the cascade model and our empirical study of taG, πiisthe product of three components: (Recalπirepresents the unnormalizedlikelihood of the joint statessNi) • [1] probability of node i’s infection as in the cascade model • [2] joint probability of c nodes in N’i=Ni\{i} becoming infected • [3] joint probability of (|N’i| - c) nodes staying uninfected
Approach 2: hGMM (cont.) Component [2]: joint probability of c nodes in N’i=Ni\{i} becoming infected • if assuming independence of c agent states in N’i, component [2] is simply a product of infection probability of c nodes. • If capturing the correlation among infections: component [2] is a product of infection of |c-γ|N’i|| “nodes,” where γ captures state correlations/interdependence
Empirical Study • Generate graphs G* (random ER and preferential attachment PA) of 30 and 100 nodes • Randomly delete 1/2 edges in creating G • Generate cascades with the parameters learned from empirical data by Stonedahl et al. (’10); • 2 domains: fast and normal • Generative model (on fully observed graphs): C on G* • Vary training data amount (25 and 100 cascades): • paG (parametric hGMM on the given graph G): learn parameters • maxC (cascade model with G’ learned by MaxInf): learn parameters + connections • netC (cascade model with G’ learned by NetInf’): learn connections (given the generative model’s parameters)
Evaluation Metrics • Capturing diffusion dynamics: log likelihood of diffusion traces Objective function • Predicting the fraction of infected nodes: KL (skewed) divergence between the predicted and actual distributions of fractions of infected nodes • Structural difference between the learned and actual graphs (only applicable for the structure learning approach)
Detailed Prediction Results • Legend paG: parametric hGMM on G • maxC: cascade model with G’ learned by MaxInf • C: generative cascade model on G • Model 1 vs. Model 2: • Black: 1 outperforms 2 (p < 0.05) • White: 2 outperforms 1 (p < 0.05) • Grey: otherwise • Summary: With sufficient data, paG is the best model. In some fast diffusion cases, maxC outperforms paG. C is the best model when the graph is fully observed
Aggregate Prediction Results KL divergence: better performing models have lower divergence
Graph Results • NetInf’ discovers more missing edges than MaxLInf, but adds more spurious edges than MaxLInf. • paG’s learned parameters help to detect if the given network has missing edges
Conclusions Contributions • We introduce two solutions: learning an hGMM on the given network structure, and directly discovering the missing connections. • Our approaches can improve prediction over existing methods in various settings with a considerable number of missing edges. Future work • Improve scalability (treating undirected and directed edges differently) • Develop more systematic analysis to detect if there’re missing edges • More effective interleaving between learning graph and model parameters
THANK YOU! qduong@umich.edu http://eecs.umich.edu/~qduong