260 likes | 388 Views
PET: A Statistical Model for Popular Events Tracking in Social Communities. Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC). Anant Pradhan. Introduction. Challenge: Tracking the evolution of a popular topic. 2. Introduction. Observing and tracking: Popular events
E N D
PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC) Anant Pradhan
Introduction Challenge: Tracking the evolution of a popular topic 2
Introduction • Observing and tracking: • Popular events • Topics that evolve over time • Existing approaches focus on: • Burstiness • Evolution of networks • Ignore interplay between textual topics and network structures. 3
Introduction • Propose a novel statistical method (PET) that: • Models the popularity of events over time • Considers burstiness of user interest • Information diffusion on the network structure • Evolution of textual topics 4
Introduction • Gibbs Random Field used to model: • Influence of historical status • Dependency relationships in the graph • Topic Model: • designed to explain the generation of text data • Interplay by regularizing each other. 5
Problem Definition • Set of vertices: Vk • Set of edges: Ek • Network Stream: G = {G1, G2, · · ·, GT} • Snapshot of network: Gk = {Vk, Ek} • Document Stream: D = {D1,D2, · · ·, DT} • Topic: θ • Event: ΘE = {θE0,θE1,θE2,· · ·, θET} • Interest: Hk = {hk(1), hk(2), · · ·, hk(N)} 6
Problem Definition • Event-related information in a social community: • An observed stream of network structures • An observed stream of text documents • A latent stream of topics about the event • A latent stream of interests 7
The General Model • Task is cast as the inference of previous Hk and Θk: P(Hk,Θk|Gk, Dk, Hk−1) • Assumption 1: Current interest status Hkis independent of the document collection Dk • Assumption 2: Current topic model θk is independent of the networkstructure Gkand the previous interest status Hk−1 8
The General Model • From the assumptions: P(Hk,Θk|Gk,Dk,Hk−1) = P(Hk|Gk,Hk−1) · P(Θk|Hk,Dk) Topic Model Interest Model 9
The Interest Model • Modelled as a Gibbs Random Field on the network Gk • Uses specially designed potential functions • Uses weighting scheme motivated by real world networks 10
The Topic Model • Models historical interest status and relationships on the network. • Allows the topics and popularity of the events to mutually influence each other over time. • P(Θk|Hk,Dk) ∝ P(Dk|Hk,Θk) P(Θk|Hk) 11
Connection to Existing Models • Special cases of PET under certain conditions. • The State Automation Model: • When the network effect is omitted • The Contagion Model • When the topic effect is omitted 12
Complexity Analysis • PLSA (Probabilistic Latent Semantic Analysis): O((N +M)mt) PET: O(NMmT) N documents involving t topics with M words, m rounds and time T. • Reasonable. 13
Experiments • JonK: State automation model. First Baseline. • Cont: The contagion model. Second Baseline. • PET- : PET minus network structures. • BOM: Box Office Earning. Gold Standard for movie-related events. • GInt: Google Insight. Gold Standard for news related events. 14
Experiments • Twitter • 5000 users • 1,438,826 tweets • From Oct 2009 to Jan 2010 • Events: 2 movies (Avatar, Twilight) 2 news events (Tiger Woods affair, Copenhagen Climate Conference) 15
Experiments • Setup: λT: Interest model. Weight for historical info. λA: Interest model. Weight for structural info. μE: Topic model. λT = 1 λA = 3 μE = 1 16
Result Analysis • PET has the best performance. • Cont has the worst performance. • JonK generally performs well, but less accurate than PET. 19
Network Diffusion Analysis • Cont can’t tell the difference between interest levels. • Both PET and PET– are able to catch the rising trend of popularity. • PET is still superior. 20
Events Analysis on DBLP • For popular events, PET generates: • More accurate trends • smoother diffusion • meaningful content evolution 23
Future Work • Apply this model to track evolution of ideas, scientific innovation. • Real-time event search system.
Conclusion • A novel approach. • Experimental evidence is convincing. • Complexity might be a reason of concern.