260 likes | 391 Views
Syndromic Classification of Twitter Messages. Nigel Collier and Son Doan National Institute of Informatics, Tokyo collier@nii.ac.jp , sondoan@gmail.com E-HEALTH, November 2011. Time. Sentinel networks. Field workers. Laboratory reports. Rumours. GP reports. Certainty. Blog rumour>
E N D
Syndromic Classification of Twitter Messages Nigel Collier and Son Doan National Institute of Informatics, Tokyo collier@nii.ac.jp, sondoan@gmail.com E-HEALTH, November 2011
Time Sentinel networks Field workers Laboratory reports Rumours GP reports Certainty Blog rumour> “I’m sick with a chest infection” Blog rumour> “Ahh! Really bad throat.” Blog rumour> “Still getting worse. Staying at home temp is up to 39.5.” News report> “Mystery illness causes concern.” News report> “Influenza starts early this year.”
Overview • Research context • Method • Results • Significance and limitations
Syndromic classification of Twitter messages RESEARCH CONTEXT
Alerting real world events What signals should we be looking for? 2. Web microblog response 1. Personal event “i’ve been waitin at the docs all morning with flu” 5. Issue alert Seeking medical intervention 3. Text mining on unstructured blogs 4. Detecting unusual events Alert level News volume Time
‘See what the world is doing right now’ – microblogs versus newswire • Newswire: • Event based reports • Near real time • Reporting bias (focus on reader’s concerns) • Editorial quality control Low level of noise • Good for health event alerting; Unknown for case counting • Social media microblogs: • Personal reports and event based re-reporting • Real time • Reporting bias (focus on writer’s concerns) • Large-scale and independent • Little quality control High level of noise • Unknown for health event alerting; Probably good for case counting
Twitter characteristics [1] • Twitter posts (tweets) are limited to 140 characters • Low user investment in time and thought for content generation (Java et al. 2007) • High use of abbreviations and aliases • Dynamic lexicon of semantic tags (hashtags) • Very high volume of data: • 55 million tweets per day • Hundreds of micro-blogs each second for major events (Petrovic et al. 2010) • Compared to ~0.1 news reports each second for newswire • Surge capacity requires highly efficient algorithms • High numbers of users • 106 million announced at Twitter developer’s conference 2010
Twitter characteristics [2] • Typical tweet contents (Nardi et al. 2004) • Daily experience (All about me) • Share opinions • Commentary on events • Spam • Meta data: • Geo-tagging • Time stamping • User profile • Event reports sometimes ahead of newswire, e.g. Iranian presidential protests, swine flu outbreak reports from CDC, deaths of famous people (Petrovic et al. 2010)
Previous work on online personal signal analysis • Google flu trends (Ginsberg et al. 2009, Valdivia et al. 2010) • Ushahidi (Okolloh 2009) • Flutracker (http://flutracker.rhizalabs.com/) • Twitter earthquake detector (Guy et al. 2010) • First story detection (Petrovic et al. 2010) • Maximum story coverage (Saha and Getoor, 2009) • New study: • GP consultation correlation for ILI in the UK (Lampos et al. 2010)
Schema development • Syndromic categories • A syndrome is a collection of symptoms (specific and non-specific) that are indicative of a class of diseases; • Six syndrome categories were chosen: constitutional, respiratory, gastrointestinal, hemorrhagic, rash; • Syndromes and symptoms were based on those in the BioCaster ontology, developed by experts in computational linguistics, public health, genetics and anthropology. • Symptom lists were expanded to include informal synonyms found in Twitter data, e.g. ‘stomach ache’, ‘belly ache’, ‘belly pain’, ‘stomach hurt’. • Case descriptions for each syndrome were then developed with positive and negative examples;
Gold standard data [1] • Three students annotated 2000 tweets per syndrome into positive or negative; • Data was sourced from Twitter between 9th and 24th July 2010 using symptom keywords and removing duplicate messages; • We then chose messages where all 3 annotators agreed on the classification to train the classifiers. • Pairwise kappa ranged from 0.42% (Neurological) to 0.92% (Hemorrhagic).
Gold standard data [2] • Positively tagged messages only included subject as user or close family member; • Hypothetical reports are negative; • User opinions about other people are negative; • Reports of conditions must be within one week of the posting time; • Reported syndromes can belong to more than one category;
Features and Models • Features: Bag of words including hashtags but excluding links • Models: Naïve Bayes (McCallum’s Rainbow) , SVM (SVM Light) with polynomial kernel (p=1,2,3) and radial basis function kernel (RBF) • 10-fold cross validation
Classifying twitter messages for syndromes • SVM with degree 1 kernel performed the best; • Precision ranged from 82.0 to 83.8 (SVM degree 1); • Recall ranged from 58.3 to 96.2 (SVM degree 1); • Performance moderately correlates with P/N ratio; • Noticeably weak performance for Hemorrhagic and Gastrointestinal where positive data was scarce and Kappa was lower.
Difficult cases • Metaphoric symptoms • Cabin fever setting in right now. • Wide range of common meanings • Exhausted after days of housework. • Interrogative sentences • wonder how long u get off work with swine flu? • Hypothetical sentences • I can ignore this sore throat no longer. And, um, maybe I should have gotten that H1N1 vaccine. • It's a mask I use with spray paint, but if I did have swine flu, why would I need a mask? • Others • Too much lemonade. My throat is burning.
BioCaster: early alerting for public health events Ontology browsing Email/GeoRSS alerting Watchboard, etc. Trend graphs Event database search Event maps Up to date news in multiple languages WHO IT JP CA US UK FR DE GHSAG partners Event alerts Real time Twitter analysis
Syndromic classification of Twitter messages Significance and limitations
Discussion • Twitter offers unique challenges and opportunities for epidemic surveillance; • Very challenging environment for automated classification but evidence from several studies points to close correlation between ILI keywords and laboratory data. No studies yet on correlating other syndromes. • The 6 classifiers are available as part of the experimental DIZIE project online at the BioCaster portal. • Future work will look into change point detection and integrating social media reports with evidence from news events for situational awareness.
Funding 2010 NII, internship grant and a grand challenge grant
http://born.nii.ac.jp Please see:
The landscape of online health event monitoring GPHIN (Ginsberg et al. 2009) MiTaP (Damianos et al. 2002) Argus(Wilson et al .2008) HealthMap (Friefeld et al. 2008) EpiSpider (Tolentino et al. 2007) BioCaster (Collier et al. 2008) Medisys (Yangarber et al. 2007) ProMed-mail (Madoff 2004) MiTaP (?) (Damianos et al. 2002) Ushahidi (Okolloh et al. 2009) Twitter Earthquake Detector (Guy et al. 2010) HealthMap (Friefeld et al. 2008) BioCaster (Collier et al. 2008) Google Flu Trends (Ginsberg et al. 2009)
Discussion [2] • Limitations of Twitter • Representation of population by country, city and age group Twitter user age distribution source: sysomos.com New twitter users by country source: sysomos.com Twitter usage by major city source: sysomos.com