170 likes | 922 Views
Instagram #Hashtag Sentiment Analysis. Jared Plumb Nipun Gunawardena Nan Xiao Hao Zhang. Can we successfully predict the sentiment of an Instagram hashtag ?. Instagram Overview. Instagram – Photo sharing social network Each post can contain a caption and hashtags
E N D
Instagram #Hashtag Sentiment Analysis Jared Plumb Nipun Gunawardena Nan Xiao Hao Zhang
Can we successfully predict the sentiment of an Instagram hashtag?
Instagram Overview • Instagram – Photo sharing social network • Each post can contain a caption and hashtags • Hashtags group posts into categories • Express extra information/emotion about the post
Hashtag Overview • Composed of words, phrases, and acronyms • Often contain misspellings, made up words, and slang/vernacular #love #myfriendsarehotterthanyourfriends #likeabos #ugly #depresstion #selfharmmm
Sentiment Analysis Overview • Natural language processing method used to identify sentiment within text • Early work by Pang [1] analyzed movie reviews from IMDB • Later work by Davidov [2] analyzed the sentiment of Twitter posts using hashtags and smileys.
Our Process • Analyze the sentiment of individual Instagram hashtags by using a Naïve Bayes Classifier • Naïve Bayes: • Initially used for spam detection • Simple but powerful • While other methods (SVM) may often work better, Naïve Bayes often used as a baseline
Naïve Bayes Classifier Assume independence & generalize Convert to algorithm
Training Data • Pang and Lee’s movie reviews • Approximately 1400 evenly split positive and negative movie reviews from IMDB • Only used a subset of this data • Hu and Liu positive/negative word lists • Approximately 6800 unevenly split popular positive and negative words from the English language • Includes common online misspellings • Positive/Negative hashtags we mined • Trained randomly on 20% of this data
N-grams • N continuous “units’ of language • Words are typically used as units in sentiment analysis 2gram(“I like turtles”) = [“I like”, “like turtles”] • We used characters as units • Through experiment, we found 3-grams or 4-grams worked best • Removed non-alpha/numeric characters 4gram(“Love it”) = [“Love”, “ovei”, “veit”]
Conclusion/Interesting Notes • Positive/Negative word list performs best • Hashtags may do better with more popular hashtags • Movie reviews don’t perform well • At first glance, Instagram is overwhelmingly positive • Sentiment analysis may have an effective 80% accuracy limit • Neutral posts weren’t counted
Thanks! #Questions?
References • [1] Pang, Bo, Lillian Lee, and ShivakumarVaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002. v1.1. • [2] Davidov, Dmitry, Oren Tsur, and Ari Rappoport. "Enhanced sentiment learning using twitter hashtags and smileys." Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010. • [3] http://www.informationweek.com/software/information-management/expert-analysis-is-sentiment-analysis-an-80--solution/d/d-id/1087919? • [4] http://www.socialmediaexplorer.com/social-media-monitoring/never-trust-sentiment-accuracy-claims/
Image Sources • [1] http://a2.mzstatic.com/us/r30/Purple/v4/11/0a/6c/110a6c60-3bf1-0f8d-089b-ab82407774ad/mzl.ikicqhss.png