210 likes | 363 Views
Tracking the Flu Pandemic by Monitoring the Social Web. Vasileios Lampos and Nello Cristianini. Jedsada Chartree 04/11/11. Introduction. Growing interest in monitoring disease outbreaks. Growing of twitter users - February, 2010 50 million tweets/day
E N D
Tracking the Flu Pandemic by Monitoring the Social Web VasileiosLampos and NelloCristianini Jedsada Chartree 04/11/11
Introduction • Growing interest in monitoring disease outbreaks. • Growing of twitter users - February, 2010 50 million tweets/day - June, 2010 65 million tweets/day (750 tweets/s - 190 million users (Source: http://en.wikipedia.org/wiki/Twitter) - 5.5 million users in the UK (2009)
Introduction • The National Statistics reports the flu delay of 1 to 2 weeks. • Twitter can reveal the situation up to date.
Methodology • Data • 1. Official health reports from the Health Protection Agency (HPA), UK. • 2. Twitter, UK • - Daily average of 160,000 tweets • (24 weeks from 06/22/2009 to 12/06/2009) • - Twitter geolocation (geographical coordinates).
Methodology • Data • Region A = Central England & Wales • Region B = South England • Region C = North England • Region D = England & Wales • Region E = Wales & Northern Ireland RCGP Qsur RCGP = Royal College of General Practitioners Qsur = Qsurveillance, University of Nottingham and Egton Medical Information Systems
Methodology HPA Flu Rates Twitter Data Flu-Score Correlation Coefficient
Methodology • Flu-Score K = Total number of markers n = Total number of tweets for one day i = [1, k] J = [1, n] M = A set of textual markers = {mi} T = Daily set of tweets = The flu-score of a tweet
Results Flu rates from the Health Protection Agency (HPA)
Results Twitter’s flu-scores for region A-E (week 26 to 49, 2009)
Results Correlation coefficients between Twitter’s flu-score and HPA’s rates
Results Twitter’s flu-score and HPA rates for region D (England&Wales)
Methodology • Learning HPA’s flu rates from Twitter flu-score K = Total number of markers, n = Total number of tweets for one day i = [1, k], i = [1, n], M = A set of textual markers = {mi} T = Daily set of tweets, w = Weighted value
Results Linear regression using the markers
Methodology • Automatic extraction of ILI textual markers 1. Creating candidate markers from: - Encyclopedic reference - Informal references 2. Forming the flu-subscores with time series. - Ranking the weights by applying the LASSO method.
Methodology LASSO T = shrinkage parameter Vector w = the spare solution W(ls) = the least squares estimates for regression problem
Methodology Stemmed markers extracted by applying LASSO regionally
Results Linear regression using the markers on the test sets after performing LASSO
Methodology Stemmed markers extracted by applying LASSO on the aggregated data
Conclusion • Tracking the flu outbreak in the UK using Twitter messages. • High correlation between the flu-score and the HPA flu rates, greater than 95%.
Reference • V. Lampos and N. Cristianini. 2010. International workshop on Cognitive Information Processing. 6 pp.