Leveraging Machine Learning Based Approaches to Assess HPV Vaccination Sentiment with Twitter

Leveraging Machine Learning Based Approaches to Assess HPV Vaccination Sentiment with Twitter Jingcheng Du, B.S., Jun Xu, Ph.D., Hsingyi Song, MPH, Cui Tao, Ph.D. Ontology Research Group School of Biomedical Informatics University of Texas Health Science Center at Houston (UTHealth) IRB Number: HSC-SBMI-16-0291

Social Media for Public Health • An important medium for public, patients and health professionals to communicate about health-related issues • 90% of respondents (age 18 to 24) said they would trust medical information shared by others on their social media networks • Studies show that information shared on social media is able to alter vaccine acceptance and decision-making 1. Moorhead SA, Hazlett DE, Harrison L, et al. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res 2013;15:e85. 2. https://getreferralmd.com/2013/09/healthcare-social-media-statistics/

HPV and HPV vaccine • Nearly all sexually active men and women get it at some point in their lives • Lead to 25-30% oral and throat cancers; 90% of anal cancers; 40% of the penile cancer; nearly 100% of cervical cancers • Haven’t started HPV vaccine series • 4 out of 10 adolescent girls • 6 out of 10 adolescent boys https://www.cdc.gov/std/hpv/stdfact-hpv.htm https://www.cdc.gov/media/releases/2015/p0730-hpv.html

Tweets Collection & Annotation • Tweets collection • Training corpus: July 15, 2015 to August 17, 2015 • 33,228 tweets have been collected • Prediction corpus: November 2, 2015 to March 28, 2016 • 184,214 tweets have been collected • Keywords • hpv, human papillomavirus, cervarixand gardasil • Annotation • 6,000 tweets randomly sampled from training corpus • Three annotators

Sentiment Classification Overview of the scheme for HPV vaccine sentiment classification on Twitter

Sentiment Distribution • Kappa inter-rater value: 0.851 • Highly unbalanced class distribution Sentiment distribution in gold standard

Machine Learning System • Pre-processing • Remove URLs, hashtags and Twitter user names • Remove duplicate letters, “wooooow” -> “woow” • Convert the texts to the lowercase • Feature extraction • Word n-grams: contiguous 1 and 2 grams of words • Word clusters feature: map tweets tokens into 1000 clusters • POS tags: extracted by TweeboParser • Classification algorithm • Support vector machines (SVM) , RBF kernel • Evaluation • 10-fold cross-validation on gold standard

Baseline Model Baseline model: use n-grams feature only and consider each classes equally

Hierarchical Classification Scheme Level 1 Level 2 Level 3 Hierarchical classification scheme for HPV vaccine sentiment classification on Twitter

Performance Improvement Performance comparison of the baseline model and hierarchical classification model

Performance improvement through optimization Du, Jingcheng, et al. "Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets." Journal of Biomedical Semantics 8.1 (2017): 9.

Evaluation on Unlabeled Tweets Dataset Evaluation on 500 randomly selected samples from prediction corpus 184,214 tweets

14 Feb 22, 16 Trends of for different sentiment

16 The association of different days of the week with the relative proportions of tweets containing Negative, Neutral and Positive opinions

Summary • Contributions: • Apply hierarchical classification to improve the prediction performance on highly unbalanced Tweets dataset • Identify interaction of real world outcome with Twitter discussionand discover the association of HPV related tweeting behaviors with different opinions • Limitations: • Limited keywords, limited features • Training and prediction corpus are separated • Poor performance on classes with very limited samples • On going projects: • Uncover discussion topics using topic model • Evaluate deep learning (i.e. CNN) on this task • Predict demographic information for these social media users

Summary • Contributions: • Apply hierarchical classification to improve the prediction performance on highly unbalanced Tweets dataset • Identify interaction of real world outcome with social media discussionand discover the association of HPV related tweeting behaviors with different opinions • Limitations: • Limited keywords, limited features • Training and prediction corpus are separated • Poor performance on classes with very limited samples • On going projects: • Uncover discussion topics using topic model • Evaluate deep learning (i.e. CNN) on this task • Predict demographic information for these social media users

Summary • Contributions: • Apply hierarchical classification to improve the prediction performance on highly unbalanced Tweets dataset • Identify interaction of real world outcome with social media discussionand discover the association of HPV related tweeting behaviors with different opinions • Limitations: • Limited keywords, limited features • Training and prediction corpus are separated • Poor performance on classes with very limited samples • On going projects: • Uncover discussion topics using topic model • Evaluate deep learning (i.e. CNN) on Twitter tasks • Predict demographic information for these social media users

Acknowledgments • UTHealth SBMI • Dr. Cui Tao’s research group • Dr. Hua Xu’s research group • UTHealth CPRIT Fellowship • Dr. Roberta Ness • Dr. Patricia Dolan Mullen • Dr. David Loose • All the fellows • Grants: • National Institutes of Health under Award Number R01LM011829 • National Institutes of Health under Award Number R01AI130460 • Cancer Prevention and Research Institute of Texas grant # RP160015

cui.tao@uth.tmc.edu jingcheng.du@uth.tmc.edu @jingchengdu Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the the National Institutes of Health and Cancer Prevention and Research Institute of Texas.

Classification Definition Detailed definition of different classes

Leveraging Machine Learning Based Approaches to Assess HPV Vaccination Sentiment with Twitter

Leveraging Machine Learning Based Approaches to Assess HPV Vaccination Sentiment with Twitter

Presentation Transcript

HUMAN PAPILLOMAVIRUS (HPV) VACCINATION PROGRAM

Dependency Parsing: Machine Learning Approaches

HUMAN PAPILLOMAVIRUS (HPV) VACCINATION PROGRAM

Sentiment Analysis on Twitter Data

HPV Vaccination Activities

Consumer sentiment analysis with Twitter

Large-Scale Machine Learning at Twitter

Twitter Data Sentiment Analysis Based on Amazon documentation

Leveraging Sentiment to Compute Word Similarity

Special HPV* Vaccination Package!

Approaches to Sentiment Analysis

Introduction: statistical and machine learning based approaches to neurobiology

HPV Vaccination Programme

Using Document Based Questions to Assess Student Learning

Twitter Sentiment in Financial Domain

Is HPV vaccination recommended for me? | HPV Vaccination in Bangalore

HPV VACCINATION