290 likes | 492 Views
I2.2: Analysis of significant substructures in time-varying networks. Ambuj Singh (in collaboration with P. Bogdanov, M. Mongiovi, X. Yang) NS-CTA INARC Mid-Year Review March 2011. 03/22/11. Dynamic networks. Dynamic networks are commonplace online interaction networks
E N D
I2.2: Analysis of significant substructures in time-varying networks Ambuj Singh (in collaboration with P. Bogdanov, M. Mongiovi, X. Yang) NS-CTA INARC Mid-Year Review March 2011 03/22/11
Dynamic networks • Dynamic networks are commonplace • online interaction networks • Twitter, Wikipedia, LinkedIn, Facebook, .. • mobile networks • Cyber-physical scenario (EDIN, INARC) • virus propagation (E2.1) • Generative models to explain the network structure • preferential attachment [Barabasi '99] • forest-fire [Lescovec '09] • Markov Chain models (discrete, continuous) • when, where, what changes [Avin '08, Clemente '08] • Latent space / context models [Zheng '05] • Network flow/traffic [Daganzo '94, Bickel '01, Stoev '09] • Disease propagation, blog cascade, SIS [Lescovec '07] • Stochastic actor-based models [Snijders '09] 03/22/11
Our focus • Dynamic edge attributes • Simplest case • edge is +1 or -1 • +1 means flow of interest • congestion, flow above historical threshold • real values are a general case and can also be considered • Query: find highest scoring substructures in graph over time • combines graph structure and time 03/22/11
Motivation: traffic congestion 03/22/11
Re-tweet rate of #music in Twitter 03/22/11
Outline • Motivation • Problem definition • Solving for a fixed time interval • Heuristic for multiple time intervals • Path Forward 03/22/11
Problem definition • A time evolving graph • G = (V, E, Ft(e)) • V: set of nodes • E: set of edges • Ft(e): mapping of edges to {-1,1} • Score of an edge e in interval [t1,t2] = ∑ Ft(e) • Score of a subgraph in interval = ∑ score(e), for all e in the subgraph 1 -1 1 -1 1 -1 -1 -1 1 -1 -1 1 -1 -1 1 -1 1 1 -1 -1 t1 t2 t3 t4 1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 1 1 -1 1 -1 1 -1 03/22/11
Prize-collecting Steiner Tree (PCST) • Given a graph G=(V, E) with positive node weights p(v) and negative edge weights c(e), find a subtree T’= (V’,E’)such that • Goemans-Williamson Minimization (GW-PCST): • Net Worth Maximization (NW-PCST): • Both are NP-hard (equivalent objective functions) [Johnson’00] • GW-PCST has an approximation factor = 2-1/(n-1). • The rooted version of NW-PCST is NP-hard to approximate within any constant factor [Feig 01] GW(T’) = ∑ c(e) + ∑ p(v) e in E’ v not in V’ NW(T’) = ∑ p(v) - ∑ c(e) v in V’ e in E’ 03/22/11
Why the same guarantee doesn’t hold for NW? APX • In this specific example: • GW-PCST • APX = 3*(k-1) • OPT = 2*k • ratio ≈ 2/3 • NW-PCST • OPT = k • APX = 3 • ratio = k/3 OPT 3 2 3 2 0 2 k 3 2 3 Optimal solution: the whole graph 03/22/11
Merge-and-refine approximation Merge nodes into clusters in a bottom-up fashion shortest-path metric graph using edge costs Merge triangle and star structures considering both node values and interconnect cost Multiple refinement iterations Approximation quality OPT <= APX + c*N(OPT), where N is the cost of interconnection Good approximation for instances in which there are cheaply connected clusters of high-prize nodes Challenges Relatively high computational cost due to all pairs shortest path computation 03/22/11
An example Aggregate edge values within the interval Transform the edge-weighted graph into NW-PCST Apply the Merge-and-refine approximation 03/22/11
Running time of merge-and-refine APSP comprises 90% of the approximation running time Takes more than a second for N=360 for one interval 03/22/11
Baseline solution across time Find the best subgraph in time by exhaustive enumeration Consider all O(t2) intervals Apply the solution for a fixed interval in each Take the best obtained subgraph in all intervals Polynomial cost, but impractical for real-world problems The highway system of Southern California has ~ 4k edges with live-traffic measurements The Autonomous Systems (AS)-level Internet backbone has hundreds of thousand of links The baseline solution would not be practical for networks of this scale Need for scalable solutions of acceptable quality 03/22/11
Best-first approach using bounds Idea: reduce the number of calls to Merge-and-refine Estimate solutions for different intervals Evaluate the most "promising" intervals first Prune intervals that do not contain the best solution Bound the solution in an interval Computationally simple to compute Effective in terms of pruning power Best first procedure Order intervals by their upper bound Prune infeasible intervals using lower bound 03/22/11
Upper bound (UB) Offline: Consider a hyper-graph in which original edges become nodes and original nodes become hyper-edges Split the original edges into k partitions via hyper-graph partitioning Maintain edges at partition "boundaries“ Online UB estimation for a fixed interval: UB of a partition is the aggregate of its positive edges Edges between partitions: 0 cost if there is at least one positive boundary edge cheapest boundary edge otherwise Solve the NW-PCST on the obtained coarse-level graph 03/22/11
Upper bound example 03/22/11
Upper bound effectiveness The upper bound is more effective if: Partitions are well connected (small diameter) Edges within partitions are correlated Boundary edges are minimal and have expected value closer to -1 than within-partition edges The upper bound is a coarse aggregation of the original graph Coarseness is controlled by # partitions Trade-off between efficiency and effectiveness 03/22/11
Upper bound quality Random Markovian graph (N=150,M=180,T=300). Number of partitions: 2-64. Random 64 is a random partitioning of edges into 64. 03/22/11
Lower bound Local iterative search in the solution space within an interval Simulated Annealing (SA) procedure that grows/shrinks a subgraph within an interval Possible moves: add/remove an edge from an existing solution Allow sub-optimal moves according to an annealing schedule Better quality than simple greedy algorithm Due to sub-optimal moves, high-score clusters can be joined even if there are more than 2-hops away Better running time than Merge-and-refine No computation of all pairs shortest paths 03/22/11
Summary Dynamic graphs with changing edge attributes Simplest query: find the highest scoring substructure Heuristics under development Approximation guarantee Empirical validation on traffic network twitter messages 03/22/11
Path forward • Maximal scoring subgraph is a building block for richer queries and analyses • What is the structure of a congestion? Global (short and large), longitudinal (prolonged and localized) or a combination of both? • What characterizes the evolution of a network? • How do different network regions compare? • Is evolution similar across networks of different genres? • Index structures • Use statistical models for indexing real-world networks • Exploit locality within the network and locality in time • Represent the network at different level of coarseness • Queries constrained by • Time • Neighborhood • Similarity queries 03/22/11
Connections • Queries/analysis of information flow (E 2.1) • Queries on mobile networks (E 2.2, E2.3) • Formal modeling of time (E1.1) • Dynamic network models (E2.1) 03/22/11
Army relevance • Query/analysis of mobility networks • Cyber-physical scenario • Query/analysis of evolving networks • Patterns of behavior in composite networks • Find terrorist groups using temporal interactions 03/22/11
Publications • P. Bogdanov, B. Baumer, P. Basu, A. Singh, and A. Bar-Noy, “Discovering Influential Groups of Agents Using Composite Network Analysis,” submitted to NetSci 2011. • P. Bogdanov, Nicholas D. Larusso and Ambuj K. Singh, “Towards Community Discovery in Signed Collaborative Interaction Networks,” published in SIASP at 2010 IEEE International Conference on Data Mining, 2010. • K. Macropol and A. Singh, “Content-based Modeling and Prediction of Information Dissemination,” submitted to ASONAM 2011. • M. Mongiovi, A. Singh, X. Yan, B. Zong, K. Psounis, “An Indexing System for Mobility-aware Information Management,” submitted to VLDB. • Ziyu Guan, Jian Wu, Zheng Yun, Ambuj K. Singh and Xifeng Yan, Assessing and Ranking Structural Correlations in Graphs, to appear at SIGMOD 2011. • Nicholas D Larusso and Ambuj K. Singh, Synopses for Probabilistic Data over Large Domains, in EDBT 2011. 03/22/11
THANK YOU! 03/22/11
Markovian dynamic models • Markovian - the graph state is a Markov Chain • Fixed set of nodes • Edges at time t depend on edges at time t-1 • Cover Time of Dynamic Graphs [Avin et Al. '08] • Introduction of Markovian Dynamic Graphs • Exponential cover time • Lazy random walks • Information spread in Markovian graphs [Clementi '09] • Edge-Markovian • Geometric Markovian - node mobility • Evolving range-dependent graphs [Grindrod '09] • Edge dynamics as a birth/death process 03/22/11
Dynamic models of traffic The cell transmission model (CTM) [Daganzo '94] Dynamic model of highway traffic Inspired by hydrodynamic theory Traffic Flow on a Freeway Network [Bickel '01] Time and context Markovian model of the traffic flow The state of a segment at time t depends on the state of its neighbors and and itself at time t-1 Model of a single highway. How about junctions? Computer Network Traffic [Stoev '09] Statistical model of traffic flow across all links Applied to traffic prediction 03/22/11
Background literature [Avin '08] Chen Avin and Zvi Lotker. "How to Explore a Fast-Changing World." 2008 [Bickel '01] Peter Bickel, Chao Chen, Jaimyoung Kwon, and John Rice. "Traffic Flow on a Freeway Network" Electrical Engineering, 2001. [Clementi '09] Andrea Clementi, Angelo Monti, Francesco Pasquale, and Riccardo Silvestri. "Information Spreading in Stationary Markovian Evolving Graphs". Informatica, 2009 [Feig’01] J. Feigenbaum, C. Padimitriou, and S. Shenker, “Sharing the Cost of Multicast Transmissions,” JCSS, 63, 21-41, 2001. [Grinford '09] Peter Grindrod and Desmond J. Higham. "Evolving Graphs: Dynamical Models, Inverse Problems and Propagation." 2009 [Johnson’00] D. Johnson, M. Minkoff, S. Phillips, “The Prize Collecting Steiner Tree Problem: Theory and Practice,” ACM SODA, 2000. [Lescovec '07] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst "Cascading behavior in large blog graphs Patterns and a Model", SDM, 2007 03/22/11
Background literature [Ribeiro '11] B. Ribeiro, D. Figueiredo, E. de Souza e Silva, and D. Towsley, "Characterizing Dynamic Graphs with Continuous-time Random Walks" SIGMETRICS 2011. [Snijders '09] Tom A.B. Snijders, Gerhard G. van de Bunt, Christian E.G. Steglich, "Introduction to Stochastic Actor-Based Models for Network Dynamics", Social Networks, 2009 [Stoev '09] Stilian A. Stoev, George Michailidis, and Joel Vaughan. "Global Modeling and Prediction of Computer Network", Arxiv 2009 [Zheng '05] A. X. Zheng and A. Goldenberg "A Generative Model for Dynamic Contextual Friendship Networks", Learning, 2005 03/22/11