1 / 27

SentiStrength: Sentiment Strength Detection in MySpace and Twitter

Virtual Knowledge Studio (VKS). Information Studies. SentiStrength: Sentiment Strength Detection in MySpace and Twitter. Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK. SentiStrength Objective.

ronat
Download Presentation

SentiStrength: Sentiment Strength Detection in MySpace and Twitter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Knowledge Studio (VKS) Information Studies SentiStrength: Sentiment Strength Detection in MySpace and Twitter Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

  2. SentiStrength Objective • Detect positive and negative sentiment strength in short informal text • Develop workarounds for lack of standard grammar and spelling • Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!) • Classify simultaneously as positive 1-5 AND negative 1-5 sentiment • Apply to MySpace comments and social issues

  3. SentiStrength Algorithm - Core • List of 890 positive and negative sentiment terms and strengths (1 to 5), e.g. • ache = -2, dislike = -3, hate=-4, excruciating -5 • encourage = 2, coolest = 3, lover = 4 • Sentiment strength is highest in sentence; or highest sentence if multiple sentences

  4. Examples positive, negative -2 1, -2 • My legs ache. • You are the coolest. • I hate Paul but encourage him. 3 3, -1 2, -4 -4 2

  5. Term Strength Optimisation • Term strengths (e.g., ache = -2) initially fixed by human coder • Term strengths optimised on training set with 10-fold cross-validation • Adjust term strengths to give best training set results then evaluate on test set • E.g., training set: “My legs ache”: coder sentiment = 1,-3 => adjust sentiment of “ache” from -2 to -3.

  6. SentiStrength Algorithm -Extra • Spelling correction for repeated letters • Helllllo -> Hello (emphasis: llll) • Tagging approach used • (see next slide) • Extra heuristics • Emphasis acts to enhance + or – emotion • Emotion words ignored in questions • Take strongest positive or negative expression in whole comment • Booster words (e.g., very, some)

  7. Tagging • HIIIIII MY MATE!!!!!!!! <w equiv="HI" em="IIIII">HIIIIII</w> <w>MY</w> <w>MATE</w> <p equiv="!" em="!!!!!!!">!!!!!!!!</p> • HI MY MATE! 3 2 mate = 2 Overall 3, -1

  8. Experiments • Development data = 2600 MySpace comments coded by 1 coder • Test data = 1041 MySpace comments coded by 3 independent coders • Comparison against a range of standard machine learning algorithms

  9. Inter-coder agreement Krippendorff’s inter-coder weighted alpha = 0.5743 for positive and 0.5634 for negative sentiment Only moderate agreement between coders but it is a hard 5-category task

  10. Machine learning methods +ve

  11. Machine learning methods -ve

  12. Results:+ve sentiment strength

  13. Results:-ve sentiment strength

  14. SentiStrength Components

  15. Example differences/errors • THINK 4 THE ADD • Computer (1,-1), Human (2,-1) • 0MG 0MG 0MG 0MG 0MG 0MG 0MG 0MG!!!!!!!!!!!!!!!!!!!!N33N3R!!!!!!!!!!!!!!!! • Computer (2,-1), Human (5,-1)

  16. Selected variations tested

  17. Application - Evidence of emotion homophily in MySpace • Automatic analysis of sentiment in 2 million comments exchanged between MySpace friends • Correlation of 0.227 for +ve emotion strength and 0.254 for –ve • People tend to use similar but not identical levels of emotion to their friends in messages

  18. Sentistrength Collective Emotions in Cyberspace CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs

  19. Application – sentiment in Twitter events • Analysis of a corpus of 1 month of English Twitter posts • Automatic detection of spikes (events) • Sentiment strength classification of all posts • Assessment of whether sentiment strength increases during important events • Result – negative sentiment normally increases, positive sentiment might tend to increase

  20. Automatically-identified Twitter spikes

  21. Chile

  22. Hawaii

  23. #oscars

  24. Tiger Woods

  25. Conclusion • Automatic classification of emotion on a 5 point positive and negative scale seems possible for MySpace • …And other similar short computer text messages? • Hard to get accuracy much over 60%? • Next = analyse emotion in online debates

  26. Publication • Thelwall, M., Buckley, K., Paltoglou, G. Cai, D., & Kappas, A. (in press). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology. • Thelwall, M., Wilkinson, D. & Uppal, S.(2010). Data mining emotion in social network communication: Gender differences in MySpace, Journal of the American Society for Information Science and Technology, 61(1), 190-199.

More Related