310 likes | 428 Views
An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs. Sitaram Asur , Srinivasan Parthasarathy and Duygu Ucar Department of Computer Science The Ohio State University. Motivation. Protein-protein interactions in yeast (Jeong et al, 2001).
E N D
An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs Sitaram Asur, Srinivasan Parthasarathy and Duygu Ucar Department of Computer Science The Ohio State University
Motivation Protein-protein interactions in yeast (Jeong et al, 2001) • Interaction Networks • Represent scientific data from various domains • Nodes represent entities • Edges represent interactions among entities • Examples: • Biological Networks - Protein-Protein Interaction (PPI) networks, gene expression networks • Collaboration networks • Social networks, online communities, blog networks Physicist collaboration network (Newman and Girvan, 2004)
Motivation • Mining interaction networks important • Gain insight into structure, properties and behavior of these networks [Newman, 2001] • Modular nature of interaction networks important • Co-expression networks : dense components - > functional modules • Social networks : clusters -> community structure
Motivation • A large number of earlier approaches focused on mining static interaction networks • Many important real-world networks are dynamic Ulrik de Lichtenberg, et al. Science 307, 724 (2005) Temporal protein interaction network of the yeast mitotic cell cycle.
Motivation • Dynamic Interaction Networks • Nodes and interactions change over time • Structure changes in the network • Need for a structured method to characterize and model evolution • Understand nature of change (evolution) in networks • Consider evolution of individuals and communities • Develop models for reasoning and inference of future events
Workflow Evolving Graph Temporal Snapshots Si Si+1 Clustering Ci Ci+1 Iterate i Analysis and Inference Event Detection Behavioral Patterns
Temporal Snapshots • Split the graph data into non-overlapping temporal snapshots • Each snapshot corresponds to a graph • Consists of all nodes and interactions active in that time period • Nodes active if they have an interaction in a particular time period T1 T2 A A B B F F E E G G C D C D
Clustering • Represent the snapshot graphs using clusters • Clusters of a graph can provide structure information • Examine the evolution of clusters over time • Can provide insight on corresponding changes to the graph • MCL clustering algorithm employed in this work • Ensemble clustering approaches can be employed to obtain robust clusters (Asur et al, ISMB 2007) T1 T2 A A B B F F E E G G C D C D
1 C 2 C 2 C 1 2 3 1 C 1 1 C C 6 4 5 3 2 C C 6 6 2 3 2 3 4 4 5 C C C C C C C 4 4 5 5 5 6 6 Community-based Event Detection • Continue • Merge • Split • Form • Dissolve 1 C T=2 T=3 T=1 T=5 T=4 T=6 1
1 C 2 A C 2 2 B 1 C 1 C 3 4 A A C 2 C 2 4 3 B B Entity-based Event Detection • Appear • Disappear • Join • Leave 1 C T=4 T=1 T=2 T=3 1 A C 2 1 B
Event Detection • Represent each set of snapshot clusters as a k X N binary cluster-membership matrix • Use bitwise operators to compute the events between each successive pair of matrices (snapshots) • Example: Continue Event Continue (Cj, Ck) = AND (Si(j), Si+1(k)) == OR(Si(j), Si+1(k)) • Event Detection algorithm linear in the number of nodes in the graph O(N)
Temporal Analysis • Use critical events for analysis • Form and Dissolve events • Used to study group formation and dissipation • Merge and Split events • Evolution of groups • Continue events • Stability of clusters/groups • Evolution of topics in a collaboration network
Behavioral Analysis • Use entity-based critical events discovered to compose incremental measures for capturing behavioral patterns • Behavioral measures can then be used to analyze evolutionary behavior of nodes and clusters • Four Behavioral measures • Stability Index • Sociability Index • Popularity Index • Influence Index
Case Study 1 : DBLP Collaboration network • Data from 28 key conferences in databases/data mining/AI over 10 years • Authors (nodes) connected by collaborations (edges) • 23136 nodes and 54989 edges • Collaboration networks display many of the structural features of social networks (Kempe, Kleinberg and Tardos 2003, Newman 2001)
Case Study 2 : Clinical Trials Network • Clinical Trials • Can provide information on risks, benefits and optimal dosage levels. • Consists of observations of patients under drug use as well as some under placebo • Generally represented as a set of multivariate time series • Evolving clinical trials network • Nodes representing patients • Correlations among patients modeled as edges • Edges change over time as correlations change • Motivation: Use evolution of correlation to identify potential toxic effects of drugs
Stability Index • Propensity of a node to interact with the same group of people over time • Stability for a node over time incrementally computed based on the stability of the clusters it belongs to
Stability for Clinical Trials data • Nodes with low Stability Index values represent patients with fluctuating correlation values (outliers) • Null Hypothesis: • If the drug does not result in toxicity, then outliers are likely to be flagged at random from each group (drug and placebo). • Experiment on clinical trials network for diabetes patients • 19 nodes (patients) found having Stability Index below threshold. • The drug under study was discontinued due to possible toxic effects. 18 out of the 19 were on the drug!!!
Sociability Index • Incremental measure of the different interactions a node participates in • Opposite of the Stability Index Does not represent degree!
Sociability Index for Community Prediction • Goal : To identify future cluster co-occurrences based on history data for the DBLP dataset • Key Intuition: If two authors have high sociability, and they have not yet collaborated (not been clustered together), there is a high chance they will. • Setup : Use the data for 1997-2001 to predict cluster co-occurrences for 2002-2006
Experimental Results • Comparison with other measures (Liben-Nowell and Kleinberg, CIKM 2003) • Common Neighbor • Adamic-Adar • Jacquard
Popularity Index • Measure of attraction of nodes to a cluster • Influence measure of a cluster • Does not reflect the size of the cluster • DBLP dataset • Can be used to identify hot topics • If a large number of nodes join a cluster and they are all working on a similar topic, it indicates a buzz around that topic for that year
Application of Popularity Index • Example : XML • Year 1999 : 3 authors (XML and web applications) • Year 2000 : 50 joins • 30 of these authors published papers on XML
Influence Index • Measure of influence of a node on others • Influence in terms of participation in critical events • Influence of a node initially computed as • Follower nodes need to be pruned! unless
Diffusion Models • Study the spread of information in an evolving interaction network (Kempe et al, 2003, 2005) • Nodes activated with information • Newly activated nodes become contagious briefly • Information propagates through the network • Activation function maps weights of the links of a node to determine if it is activated • SUM Activation: If sum of weights > threshold, activate • MAX Activation: If any single weight > threshold, activate t1 t2 t3 t4
Diffusion Models – Influence Maximization • Influence Maximization Problem : Find initial set of nodes that can activate the most number of nodes over a time period • Critical in applications such as viral marketing and for epidemiological research • Complicated in the case of dynamic interaction networks as the network changes over time • Need for dynamic measures that reflect the current status of the network • Sociability Index used to weight links • Highly sociable nodes have high propensity to pass on information • Influence Index to determine initial set of active nodes • Comparison with random choice of nodes and degree-based selection (Wasserman and Faust, 1994)
Temporal Snapshots Clustering Analysis and Inference Event Detection Behavioral Patterns Conclusions • Most real-world graphs dynamic in nature • Need for analysis, reasoning and inference • Proposed an event-based framework • Clusters to capture structure at different snapshots • Critical events over clusters to identify dynamic properties of graphs • Behavioral patterns incrementally composed from critical events • Proposed method useful in many application domains • Protein function prediction, drug design, recommender systems, viral marketing, epidemiology
Future Directions • Extensions to large interaction graphs • Use of semantic information for reasoning and inference • Merge and Split Events • If two clusters have high semantic similarity, probability of a Merge is high • Continue events • Track the evolution of topics • Sequences of Form, Continue, Continue … • Multi-scale temporal modeling • Analyze snapshots of different granularity
Thanks! • Poster # 36, this evening (Mon 13th Aug, 6:15 – 9:15 pm) • This work was supported by the following grants: • DOE Early Career Principal Investigator AwardNo. DE-FG02-04ER25611 • NSF CAREER Grant IIS-0347662 • Contacts: • Sitaram Asur : asur@cse.ohio-state.edu • Dr Srinivasan Parthasarathy : srini@cse.ohio-state.edu • Duygu Ucar : ucar@cse.ohio-state.edu • Group Webpage : http://dmrl.cse.ohio-state.edu