1 / 22

Leveraging Machine Learning Based Approaches to Assess HPV Vaccination Sentiment with Twitter

This study explores the use of machine learning to assess sentiment towards HPV vaccination using Twitter data. The hierarchical classification approach improves prediction performance on the highly unbalanced dataset and identifies associations between real-world outcomes and social media discussion. Limitations include limited keywords and features, separated training and prediction corpus, and poor performance on classes with limited samples. Future work includes uncovering discussion topics using topic modeling, evaluating deep learning approaches, and predicting demographic information.

jbarnes
Download Presentation

Leveraging Machine Learning Based Approaches to Assess HPV Vaccination Sentiment with Twitter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leveraging Machine Learning Based Approaches to Assess HPV Vaccination Sentiment with Twitter Jingcheng Du, B.S., Jun Xu, Ph.D., Hsingyi Song, MPH, Cui Tao, Ph.D. Ontology Research Group School of Biomedical Informatics University of Texas Health Science Center at Houston (UTHealth) IRB Number: HSC-SBMI-16-0291

  2. Social Media for Public Health • An important medium for public, patients and health professionals to communicate about health-related issues • 90% of respondents (age 18 to 24) said they would trust medical information shared by others on their social media networks • Studies show that information shared on social media is able to alter vaccine acceptance and decision-making 1. Moorhead SA, Hazlett DE, Harrison L, et al. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res 2013;15:e85. 2. https://getreferralmd.com/2013/09/healthcare-social-media-statistics/

  3. HPV and HPV vaccine • Nearly all sexually active men and women get it at some point in their lives • Lead to 25-30% oral and throat cancers; 90% of anal cancers; 40% of the penile cancer; nearly 100% of cervical cancers • Haven’t started HPV vaccine series • 4 out of 10 adolescent girls • 6 out of 10 adolescent boys  https://www.cdc.gov/std/hpv/stdfact-hpv.htm https://www.cdc.gov/media/releases/2015/p0730-hpv.html

  4. Tweets Collection & Annotation • Tweets collection • Training corpus: July 15, 2015 to August 17, 2015 • 33,228 tweets have been collected • Prediction corpus: November 2, 2015 to March 28, 2016 • 184,214 tweets have been collected • Keywords • hpv, human papillomavirus, cervarixand gardasil • Annotation • 6,000 tweets randomly sampled from training corpus • Three annotators

  5. Sentiment Classification Overview of the scheme for HPV vaccine sentiment classification on Twitter

  6. Sentiment Distribution • Kappa inter-rater value: 0.851 • Highly unbalanced class distribution Sentiment distribution in gold standard

  7. Machine Learning System • Pre-processing • Remove URLs, hashtags and Twitter user names • Remove duplicate letters, “wooooow” -> “woow” • Convert the texts to the lowercase • Feature extraction • Word n-grams: contiguous 1 and 2 grams of words • Word clusters feature: map tweets tokens into 1000 clusters • POS tags: extracted by TweeboParser • Classification algorithm • Support vector machines (SVM) , RBF kernel • Evaluation • 10-fold cross-validation on gold standard

  8. Baseline Model Baseline model: use n-grams feature only and consider each classes equally

  9. Hierarchical Classification Scheme Level 1 Level 2 Level 3 Hierarchical classification scheme for HPV vaccine sentiment classification on Twitter

  10. Performance Improvement Performance comparison of the baseline model and hierarchical classification model

  11. Performance improvement through optimization Du, Jingcheng, et al. "Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets." Journal of Biomedical Semantics 8.1 (2017): 9.

  12. Evaluation on Unlabeled Tweets Dataset Evaluation on 500 randomly selected samples from prediction corpus 184,214 tweets

  13. 14 Feb 22, 16 Trends of for different sentiment

  14. 15

  15. 16 The association of different days of the week with the relative proportions of tweets containing Negative, Neutral and Positive opinions

  16. Summary • Contributions: • Apply hierarchical classification to improve the prediction performance on highly unbalanced Tweets dataset • Identify interaction of real world outcome with Twitter discussionand discover the association of HPV related tweeting behaviors with different opinions • Limitations: • Limited keywords, limited features • Training and prediction corpus are separated • Poor performance on classes with very limited samples • On going projects: • Uncover discussion topics using topic model • Evaluate deep learning (i.e. CNN) on this task • Predict demographic information for these social media users

  17. Summary • Contributions: • Apply hierarchical classification to improve the prediction performance on highly unbalanced Tweets dataset • Identify interaction of real world outcome with social media discussionand discover the association of HPV related tweeting behaviors with different opinions • Limitations: • Limited keywords, limited features • Training and prediction corpus are separated • Poor performance on classes with very limited samples • On going projects: • Uncover discussion topics using topic model • Evaluate deep learning (i.e. CNN) on this task • Predict demographic information for these social media users

  18. Summary • Contributions: • Apply hierarchical classification to improve the prediction performance on highly unbalanced Tweets dataset • Identify interaction of real world outcome with social media discussionand discover the association of HPV related tweeting behaviors with different opinions • Limitations: • Limited keywords, limited features • Training and prediction corpus are separated • Poor performance on classes with very limited samples • On going projects: • Uncover discussion topics using topic model • Evaluate deep learning (i.e. CNN) on Twitter tasks • Predict demographic information for these social media users

  19. Acknowledgments • UTHealth SBMI • Dr. Cui Tao’s research group • Dr. Hua Xu’s research group • UTHealth CPRIT Fellowship • Dr. Roberta Ness • Dr. Patricia Dolan Mullen • Dr. David Loose • All the fellows • Grants: • National Institutes of Health under Award Number R01LM011829 • National Institutes of Health under Award Number R01AI130460 • Cancer Prevention and Research Institute of Texas grant # RP160015

  20. cui.tao@uth.tmc.edu jingcheng.du@uth.tmc.edu @jingchengdu Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the the National Institutes of Health and Cancer Prevention and Research Institute of Texas.

  21. Classification Definition Detailed definition of different classes

More Related