1 / 19

Modeling Information Diffusion in Networks with Unobserved Links

Modeling Information Diffusion in Networks with Unobserved Links. Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan. Networks with unobserved links. Links help to model how information diffuses from one node to another

sammy
Download Presentation

Modeling Information Diffusion in Networks with Unobserved Links

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan

  2. Networks with unobserved links • Links help to model how information diffuses from one node to another • Real-world agents/nodes have connections unobserved by third parties

  3. Problem Overview Given: a network (with missing links) and snapshots of the network states over time. Objective:model information diffusions on networks We examine two different approaches: • Learning the underlying network, upon which a diffusion model is built (similar to some previous work’s approach) • Building a flexible model without learning the missing links

  4. Problem Overview (cont.) Formalism • A node/agent is in state st= 1 if infected, and -1 otherwise, at timet (infection persists) • A diffusion instance/trace srecords snapshots of the network’s states over time • Underlying network G* • Input network G (G* with missing edges) • Niis the neighborhood of iin G (including i itself) Underlying diffusion process: cascade • The probability of infection is proportional with the number of infected neighbors • The model’s parameters determine: (a) the diffusion rate and (b) the spontaneous infection rate.

  5. Problem Summary Capturing diffusion dynamics: log likelihood of diffusion traces L(s) Objective function 1. Structure learning approach Learn network G’ Learn parameters for a cascade model built on G’ 1. Network G 2. A set of diffusion traces s. (training) Evaluation on testing sets of diffusion traces 2. Graphical model approach Learn parameters for a graphical multiagent model built on G

  6. Approach 1: Learning Missing Links MaxInf algorithm (maxC) • Assumption: nodes can be infected by multiple neighbors, as in the cascade model • Objective function: likelihood of traces L(s) • Outline: • greedily adding edges • learning model parameters after each addition that increase the objective function the most • Repeat until the objective function starts to decrease Related work: NetInf [Gomez-Rodriguez et al. ’10]. • Adopted version NetInf’ (netC)

  7. Approach 2: History-Dependent Graphical Multiagent Model • hGMM [Duong, Wellman, Singh, and Vorobeychik AAMAS’10] • Directed edges from node Nidto i: how neighbors’ past states affect i’s present state. • Undirected edges define Niu: correlations/interdependencies among nodes the same time t. (*) Cascade and many others assume conditional independence given history (Niucontainsiitselfonly) (**) For simplicity, we assume Ni = Nid= Niu

  8. Approach 2: hGMM (cont.) Each neighborhood is associated with a potential function πithat represents the unnormalizedlikelihood of the joint statessNi • potential of neighborhood’s joint states at t Joint probability distribution of system’s states at time t neighborhood-relevant abstracted history abstracted history

  9. Approach 2: hGMM (cont.) • hGMMs allow reasoning about state correlations between neighbors who appear disconnected in the input graphical structure • Example: hGMMs could use the potential function of node 2 to express correlations between nodes 1 and 3 to compensate for the missing edge (1, 3). 4 4 4 2 2 2 1 1 1 3 3 3

  10. Approach 2: hGMM (cont.) A. Tabular hGMM(taG): potential πi of each neighborhood is a function of 5 features: • number of agents infected at t-1, • number of agents becoming infected at t, • neighborhood size, • i’s state at t (present) • i’s state at t-1 (past)

  11. Approach 2: hGMM (cont.) B. ParametrichGMM(paG): based on the cascade model and our empirical study of taG, πiisthe product of three components: (Recalπirepresents the unnormalizedlikelihood of the joint statessNi) • [1] probability of node i’s infection as in the cascade model • [2] joint probability of c nodes in N’i=Ni\{i} becoming infected • [3] joint probability of (|N’i| - c) nodes staying uninfected

  12. Approach 2: hGMM (cont.) Component [2]: joint probability of c nodes in N’i=Ni\{i} becoming infected • if assuming independence of c agent states in N’i, component [2] is simply a product of infection probability of c nodes. • If capturing the correlation among infections: component [2] is a product of infection of |c-γ|N’i|| “nodes,” where γ captures state correlations/interdependence

  13. Empirical Study • Generate graphs G* (random ER and preferential attachment PA) of 30 and 100 nodes • Randomly delete 1/2 edges in creating G • Generate cascades with the parameters learned from empirical data by Stonedahl et al. (’10); • 2 domains: fast and normal • Generative model (on fully observed graphs): C on G* • Vary training data amount (25 and 100 cascades): • paG (parametric hGMM on the given graph G): learn parameters • maxC (cascade model with G’ learned by MaxInf): learn parameters + connections • netC (cascade model with G’ learned by NetInf’): learn connections (given the generative model’s parameters)

  14. Evaluation Metrics • Capturing diffusion dynamics: log likelihood of diffusion traces Objective function • Predicting the fraction of infected nodes: KL (skewed) divergence between the predicted and actual distributions of fractions of infected nodes • Structural difference between the learned and actual graphs (only applicable for the structure learning approach)

  15. Detailed Prediction Results • Legend paG: parametric hGMM on G • maxC: cascade model with G’ learned by MaxInf • C: generative cascade model on G • Model 1 vs. Model 2: • Black: 1 outperforms 2 (p < 0.05) • White: 2 outperforms 1 (p < 0.05) • Grey: otherwise • Summary: With sufficient data, paG is the best model. In some fast diffusion cases, maxC outperforms paG. C is the best model when the graph is fully observed

  16. Aggregate Prediction Results KL divergence: better performing models have lower divergence

  17. Graph Results • NetInf’ discovers more missing edges than MaxLInf, but adds more spurious edges than MaxLInf. • paG’s learned parameters help to detect if the given network has missing edges

  18. Conclusions Contributions • We introduce two solutions: learning an hGMM on the given network structure, and directly discovering the missing connections. • Our approaches can improve prediction over existing methods in various settings with a considerable number of missing edges. Future work • Improve scalability (treating undirected and directed edges differently) • Develop more systematic analysis to detect if there’re missing edges • More effective interleaving between learning graph and model parameters

  19. THANK YOU! qduong@umich.edu http://eecs.umich.edu/~qduong

More Related