450 likes | 574 Views
Rise and Fall Patterns of Information Diffusion: Model and Implications. Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU), Lei Li (UCB), Christos Faloutsos (CMU). Motivation. Q: How do news and rumors spread in socia l media?. Social media
E N D
Rise and Fall Patterns of Information Diffusion:Model and Implications Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU), Lei Li(UCB), Christos Faloutsos (CMU) Y. Matsubara et al.
Motivation Q: How do news and rumors spread in social media? Social media facilitate faster diffusion of news and rumors Y. Matsubara et al.
News spread in social media “you can put lipstick on a pig” (# of mentions in blogs) (per hour, 1 week) “yes we can” MemeTracker[Leskovec et al. KDD’09] short phrases sourced from U.S. politics in 2008 Y. Matsubara et al.
News spread in social media “you can put lipstick on a pig” (# of mentions in blogs) News spread Decay Breaking news (per hour, 1 week) “yes we can” MemeTracker[Leskovec et al. KDD’09] short phrases sourced from U.S. politics in 2008 Y. Matsubara et al.
Rise and fall patterns in social media “#assange” “#stevejobs” (per hour, 1week) (per hour, 1 week) “tsunami” (in 2005) “harry potter” (2010 - 2011) (per week, 1 year) (per week, 2 years) • Twitter (# of hashtags per hour) • Google trend (# of queries per week) Y. Matsubara et al.
Rise and fall patterns in social media • How many patterns are there? • -Earlier work claims there’re several classes • four classes on YouTube [Crane et al. PNAS’08] • six classes on Media [Yang et al. WSDM’11] Y. Matsubara et al.
Rise and fall patterns in social media Q. How many classes are there after all? A. Our answer is “ONE”! We can represent all patterns by single model Y. Matsubara et al.
Outline • Motivation • Problem definition • Proposed method • Experiments • Discussion – SpikeM at work • Conclusions Y. Matsubara et al.
Problem definition Problem 1 (What-if?) β Given: Network of bloggers/users External shock/event Quality of the event β Find: How blogging activity will evolve over time Goal: predict/model social activity Y. Matsubara et al.
Problem definition Problem 2 (Model design) β Given: Behavior of spikes Find: Equation/model that can explain them, e.g., # of potential bloggers Strength of external shock Quality of the event β Epidemic process by word-of-mouth Goal: predict/model social activity Y. Matsubara et al.
Outline • Motivation • Problem definition • Proposed method • Experiments • Discussion – SpikeM at work • Conclusions Y. Matsubara et al.
Proposed method 1. periodicities SpikeM capture 3 properties of real spike Y. Matsubara et al.
Proposed method 1. periodicities 2. avoid infinity SpikeM capture 3 properties of real spike Y. Matsubara et al.
Proposed method 3. power-law fall 1. periodicities 2. avoid infinity SpikeM capture 3 properties of real spike Y. Matsubara et al.
Proposed method 3. power-law fall 1. periodicities 2. avoid infinity SpikeMcapture behavior of real spikes using few parameters SpikeM capture 3 properties of real spike Y. Matsubara et al.
Main idea (details) • Nodes (bloggers) consist of two states • Un-informed of rumor • informed, and Blogged about rumor Time n=0 U B • 1. Un-informed bloggers (clique of N bloggers/nodes) Y. Matsubara et al.
Main idea (details) Time n=0 Time n=nb • External shock • Event happened at time • bloggers are informed, blog about news • 1. Un-informed bloggers (clique of N bloggers/nodes) • 2. External shock at time nb(e.g, breaking news) Y. Matsubara et al.
Main idea (details) β Time n=0 Time n=nb Time n=nb+1 • Infectiveness of a blog-post • Strength of infection (quality of news) • Decay function (how infective a blog posting is) • 1. Un-informed bloggers (clique of N bloggers/nodes) • 2. External shock at time nb(e.g, breaking news) • 3. Infection(word-of-mouth effects) Y. Matsubara et al.
Main idea (details) Decay function: β Linear scale Log scale -1.5 Time n=0 Time n=nb Time n=nb+1 • Infectiveness of a blog-post • Strength of infection (quality of news) • Decay function (how infective a blog posting is) • 1. Un-informed bloggers (clique of N bloggers/nodes) • 2. External shock at time nb(e.g, breaking news) • 3. Infection(word-of-mouth effects) Y. Matsubara et al.
SpikeM-base (details) Blogged Un-informed • Total population of available bloggers • Strength of infection/news • External shock at birth (time ) • Background noise • Equations of SpikeM (base) Y. Matsubara et al.
SpikeM - with periodicity (details) Blogged Periodicity Un-informed 12pm Peak activity 3am Low activity Bloggers change their activity over time (e.g., daily, weekly, yearly) activity Time n • Full equation of SpikeM Y. Matsubara et al.
Model fitting (Details) • Learning parameters • Given a real time sequence • Minimize the error • (Levenberg-Marquardt (LM) fitting) SpikeM consists of 7 parameters Y. Matsubara et al.
Analysis rise fall SpikeMvs. SI model (susceptible infected model) SpikeM matches reality exponential riseand power-raw fall Y. Matsubara et al.
Analysis rise fall Reverse x-axis Rise-part SpikeM:exponential SI model:exponential Linear-log Log-log Y. Matsubara et al.
Analysis rise fall Fall-part SpikeM: power law SI model: exponential SpikeM matches reality Linear-log Log-log Y. Matsubara et al.
Outline • Motivation • Problem definition • Proposed method • Experiments • Discussion – SpikeM at work • Conclusions Y. Matsubara et al.
Experiments Q1. Match real spikes - Q1-1: K-SC clusters - Q1-2: MemeTracker - Q1-3: Twitter - Q1-4: Google trend Q2. Forecast future patterns We answer the following questions… Y. Matsubara et al.
Q1-1 Explaining K-SC clusters • Six patterns of K-SC [Yang et al. WSDM’11] • SpikeMcan generate all patterns in K-SC Y. Matsubara et al.
Q1-2 Matching MemeTracker patterns Linear scale Noise-robust fitting Logscale Outliers • MemeTracker(memes in blogs) [Leskovec et al. KDD’09] • SpikeMcan fit various patterns in blog Y. Matsubara et al.
Q1-3 Matching Twitter data Linear scale Logscale • Twitter data (hashtags) • SpikeMcan generate various patterns in social media Y. Matsubara et al.
Q1-4 Matching Google trend data • Volume of searches for queries on Google • SpikeMcan capture various patterns Y. Matsubara et al.
Q2 Tail-part forecasts • - Given a first part of the spike • - forecast the tail part • SpikeMcan capture tail part (AR: fail) Y. Matsubara et al.
Outline • Motivation • Problem definition • Proposed method • Experiments • Discussion – SpikeM at work • Conclusions Y. Matsubara et al.
SpikeM at work • A1. What-if forecasting • A2. Outlier detection • A3. Reverse engineering SpikeM is capable of various applications Y. Matsubara et al.
A1. “What-if” forecasting (1) First spike (2) Release date (3) Two weeks before release ? ? • Forecast not only tail-part, but also rise-part! • e.g., given (1) first spike, • (2) release date of two sequel movies • (3) access volume before the release date Y. Matsubara et al.
A1. “What-if” forecasting (1) First spike (2) Release date (3) Two weeks before release • Forecast not only tail-part, but also rise-part! • SpikeMcan forecast upcoming spikes Y. Matsubara et al.
A2. Outlier detection Another earthquake One year after Indian Ocean earthquake Indian Ocean earthquake • Fitting result of “tsunami (Google trend)” • in log-log scale Y. Matsubara et al.
A3. Reverse engineering Meme Twitter • SpikeM provide an intuitive explanation • PDF of parameters over 1,000 memes/hashtags Y. Matsubara et al.
A3. Reverse engineering Observation 1 Total population N is almost same Meme Twitter • SpikeM provide an intuitive explanation • PDF of parameters over 1,000 memes/hashtags Y. Matsubara et al.
A3. Reverse engineering Observation 2 Strength of first burst (news) is Meme Twitter • SpikeM provide an intuitive explanation • PDF of parameters over 1,000 memes/hashtags Y. Matsubara et al.
A3. Reverse engineering Observation 3 Daily periodicity with phase shift Every meme has the same periodicity without lag Meme Twitter (Twitter) Daily periodicity with more spread in (i.e., Multiple time zone) • SpikeM provide an intuitive explanation • PDF of parameters over 1,000 memes/hashtags Y. Matsubara et al.
Outline • Motivation • Background • Proposed method • Experiments • Discussion – SpikeM at work • Conclusions Y. Matsubara et al.
Conclusions • SpikeM has following advantages: • Unification power • It includes earlier patterns/models • Practicality: • It Matches real datasets • Parsimony • It requires only 7 parameters • Usefulness: • What-if scenarios, outliers, etc. Y. Matsubara et al.
Acknowledgements Thanks JaewonYang & Jure Leskovec for the six clusters [WSDM’11] Funding Y. Matsubara et al.
Thankyou Yasuko Matsubara Yasushi Sakurai B. Aditya Prakash Lei Li Christos Faloutsos Code: http://www.kecl.ntt.co.jp/csl/sirg/people/yasuko/software.html Email: matsubara.yasukolab.ntt.co.jp Y. Matsubara et al.