240 likes | 256 Views
Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models. Liangzhe Chen , K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, B. Aditya Prakash Computer Science at Virginia Tech. Introduction: Surveillance. How to estimate and predict flu trends?.
E N D
Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models Liangzhe Chen, K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, B. Aditya Prakash Computer Scienceat Virginia Tech
Introduction: Surveillance • How to estimate and predict flu trends? Surveillance Report Hospital record Lab survey Population survey
Introduction: GFT& Twitter • Estimate flu trends using online electronic sources So cold today, I’m catching cold. I have headache, sore throat, I can’t go to school today. My nose is totally congested, I have a hard time understanding what I’m saying.
Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work
Observation 1: States • There are different states in an infection cycle. • SEIR model: 1. Susceptible 2.Exposed 3. Infected 4.Recovered
Observation 2: Ep. & So. Gap • Infection cases drop exponentially in epidemiology (Hethcote 2000) • Keyword mentions drop in a power-law pattern in social media (Matsubara 2012)
Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work
HFSTM Model • Hidden Flu-State from Tweet Model (HFSTM) • Each word (w) in a tweet (Oi) can be generated by: • A background topic • Non-flu related topics • State related topics Latent state Initial prob. Transit. switch Transit. prob. Binary non-flu related switch Binary background switch Word distribution
HFSTM Model • Generating tweets Generate the state for a tweet Generate the topic for a word State: [S,E,I] Topic: [Background, Non-flu, State] restaurant good S: This is really E: The movie was good but was it freezing I: I think I have flu
Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work
Inference • EM-based algorithm: HFSTM-FIT • E-step: • At(i)=P(O1,O2,…,Ot,St=i) • Bt(i)=P(Ot+1,…,OTu|St=i) • γt(i)=P(St=i|Ou) • M-step: • Other parameters such as state transition probabilities, topic distributions, etc. • Parameters learned:
Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work
Vocabulary & Dataset • Vocabulary (230 words): • Flu-related keyword list by Chakraborty SDM 2014 • Extra state-related keyword list • Dataset (34,000 tweets): • Identify infected users and collect their tweets • Train on data from Jun 20, 2013-Aug 06, 2013 • Test on two time period: • Dec 01, 2012- July 08, 2013 • Nov 10, 2013-Jan 26, 2014
Learned word distributions • The most probable words learned in each state Probably healthy: S Having symptons: E Definitely sick: I
Learned state transition Transition probabilities Transition in real tweets Learned by HFSTM: Not directly flu-related, yet correctly identified
Flu trend fitting • Ground-truth: • The Pan American Health Organization (PAHO) • Algorithms: • Baseline: • Count the number of keywords weekly as features, and regress to the ground-truth curve. • Google flu trend: • Take the google flu trend data as input, regress to the PAHO curve. • HFSTM: • Distinguish different states of keyword, and only use the number of keywords in I state. Again regress to PAHO.
Flu trend fitting • Linear regression to the case count reported by PAHO (the ground-truth)
Bridging the Ep. & So. Gap • Select some flu-related keyword • Plot its number of mentions w.r.t time • Identify the fall-part • Fit the fall-part with exponential functions, and power law.
Bridging the Ep. & So. Gap • Fitting the fall-part with power-law and exponential functions
Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work
Conclusions • HFSTM: • infers biological states for twitter users. • learns word distributions and state transitions. • helps predict the flu-trend. • reconciles the social contagion activity profile to standard epidemiological models.
Outline • Observations • HFSTM Model • Inference • Experiments • Conclusion • Future work
Future work • A possible issue with HFSTM • Suffer from large, noisy vocabulary. • Semi-supervision for improvement • Introduce weak supervision into HFSTM.
Code at:http://people.cs.vt.edu/~liangzhe Questions? Naren Ramakrishnan Liangzhe Chen K. S. M. Tozammel Hossain Patrick Butler B. Aditya Prakash Funding: