280 likes | 632 Views
Sentiment analysis. Or, how to find happiness. Why do we want sentiment info?. Useful input for detection Brand sentiment Useful input for prediction Stock market, box office revenues, political outcomes Potentially for social uprisings, terrorist incidents.
E N D
Sentiment analysis Or, how to find happiness.
Why do we want sentiment info? • Useful input for detection • Brand sentiment • Useful input for prediction • Stock market, box office revenues, political outcomes • Potentially for social uprisings, terrorist incidents
Three considerations for a sentiment analysis system • Data cleaning • One piece of the puzzle • Simple works best
Data cleaning: on Twitter… • Spam accounts • Bots (Weather, sport, etc…) Answer: a) http://trst.me/ (from infochimps) b) Make your own system
Data cleaning: from sentences to words • Tokenize the sentence(s) into words. (This may not be as easy as it seems). • Maybe do stopping/stemming, depending on application. • Pick a threshold of times we have to see a word in our training set, below which we ignore it. • Build a dictionary of words. Answer: a) Twokenize.py b) Write your own
Always make it part of a system • When it’s wrong (and this is quite often) it will be very obviously wrong • People don’t need to see this • This doesn’t actually detract from the utility of the system
Success: • Tracking political polls. • Predicting box office revenues. • Predicting the stock market.
The quick version • Use supervised/semi-supervised learning method. • For most cases I would recommend Naïve Bayes on the Bag of Words representation. Very simple to implement and near-best performance. • If you don’t have any examples of happy/sad tweets (for your purpose), use known keywords, such as emoticons.
Things that don’t really help (Generally less than 2% improvement) • More advanced classifiers (eg SVMs) • Part of Speech tagging • Parse trees • Semi-supervised methods if you have very large amounts of data
Basic positive/negative Twitter sentiment word list • http://alexdavies.net/projects/twitter-sentiment-word-lists/