150 likes | 262 Views
Bursty Subgraphs in Social Networks. Milad Eftekhar , Nick Koudas , Yashar Ganjali University of Toronto. Introduction. Bursts in social networks. Information Burst Unusual activity compared to the average case Types External source case Cases where users act independently
E N D
Bursty Subgraphs in Social Networks MiladEftekhar, Nick Koudas, YasharGanjali University of Toronto
Bursts in social networks • Information Burst • Unusual activity compared to the average case • Types • External source case • Cases where users act independently • Example: earthquake, a soccer game • Influence case • The activity of neighbors has an influenceon the activity of a node • Example: protests in Egypt
Burst Examples Goal! A Soccer Game A Protest
External source case Assign burst state to each node of the graph! fB = 0.8 fN = 0.1 fB = 0.3 fN = 0.4 B fB = 0.1 fN = 0.6 fB = 0.9 fN=0.1 N B N Naïve approach: assign states to nodes separately (in isolation)
Problem with the Naïve approach N B N B B N B B N N N • Goal: group bursty users • The Naïve approach does not consider the links • Similar behavior for nodes in a same neighborhood • Fragmentation
Intrinsic burst model fB = 0.8 fN = 0.1 fB = 0.3 fN = 0.4 B fB = 0.1 fN = 0.6 fB = 0.9 fN=0.1 B N B A similar problem for the case of time-series data Generalize
The Optimal Algorithm non-bursty bursty W = log(1/fB(u)) W = log(1/fN(u)) s t W = log(λ/1-λ) G G’ • This problem is equivalent to the Min Cut problem on weighted graphs. • Time complexity: • NOT efficient for large graphs including millions of nodes and billions of edges.
DIBA: Dynamic programming Intrinsic Burst detection Algorithm • Simplify the problem • Remove some edges to create a tree • Predict edges with the highest impact on the final state assignment • DIBA is a linear optimal solution for this simplified problem.
Social Burst Model v u • Influence case • burst values • Internal burst probability • Neighbors’ burst values • for all neighbors • SODA: (SOcialburst Detection Algorithm) an iterative algortihm to calculate the burst values.
Experiments • Dataset • 30 days of Twitter fire hose • Each day contains 30 million distinct users (connected with 1.5 billion edges) generating 300-400 million tweets • Identify burstysubgraphs for hot topics
Run time results Please see the paper for the figures on the sensitivity of DIBA and SODA (the run time and the quality of the results) on different parameters. • Both algorithms are fast • DIBA: 2.5 minutes • SODA: 2 hours
Sample qualitative results • Grand Prix • Fans of Car race acrossdifferent countries talking about Canadian formula1 grand prix • groups of Volleyball related twitter handles tweeting about FIVB volleyball world grand prix • groups of twitter handles tweeting about Women’s Thailand Open Grand Prix Gold badminton tournament.
Future works • Exploring applicationsof our techniques in web search domain • e.g. utilizing the subgraphs detected for “grand prix” to address problems such as • Diversified search • Query expansion • Dynamicsof the bursty subgraphs
Thank you all! Questions?