180 likes | 392 Views
Consumer sentiment analysis with Twitter. Reetta Suonperä August 2013. Two months , one csv.gz file per day In total about 1.2 billion tweets
E N D
Consumer sentiment analysis with Twitter Reetta Suonperä August 2013
Two months, one csv.gz file per day • In total about 1.2 billion tweets • It's always easy for a person to say get over, but you don't feel what heart feels to make that statment|PrettynPinkC215|2011-02-01T04:01:16Z|2011-02-01T04:00:48Z|1296532876139018784| My dataset
General approach: natural language processing (NLP) • The Natural Language Toolkit (NLTK) The tools I use
A survey-based indicator of consumer confidence or sentiment • History goes back to 1946 at University of Michigan • Ireland’s consumer sentiment index by the ESRI since 1996 Introduction: the consumer sentiment index
Q1: Economic situation in the country (next 12 months) • Q2:Unemployment in the country (next 12 months) • Q3: Household financial situation (12 months ago) • Q4: Household financial situation (next 12 months) • Q5: Good/bad time to buy large household items • Answers: positive/neutral/negative ESRI survey questions
This is what it looks like:The KBC/ESRI consumer sentiment index
On the June 2013 improvement in households’ assessment of their personal finances: “We think that the ECB rate cut in May played some role … a combination of low inflation, early summer sales and increasing signs of improvement in the residential property market could have contributed…” On the decline in the July 2013 index: “We think reports that the Irish economy had fallen back into recession and a couple of high profile job loss announcements unnerved consumers last month.” We can speculate on what drives sentiment – but we can’t really know
More timely • Continuous information • Save money • What drives sentiment Motivation: why using Twitter could help
O’Connor et al (2010): From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series • An index based on tweets containing the word “jobs” correlates with the Michigan index and Gallup’s daily poll • Indices with economy or job correlate poorly! Previous research
Use WordNet to find synonyms for initial keyword list: • Words have many different meanings • Include part-of-speech tag • Word doesn’t exist in WordNet? • Output does not include tenses or plurals Using WordNet to expand seed wordlist
Regular expressions for more basic tasks: • Cleaning, tokenising URLs, usernames • NLTK functionality for more complex tasks • Stopword removal, stemming, POS-tagging Pre-processing tasks
Do more filtering using bigrams? • “I broke” • “pay cut” • “new job” • Use POS tags? • Classification? Fine selection – not there yet…
The to-do list • Finalise fine selection • Sentiment classification • Visualisation
Resources • www.nltk.org • Natural Language Processing with Python:http://nltk.org/book/ • Python Text Processing with NLTK 2.0 Cookbook
Resources • O’Connor et al (2010): From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series • Bollen et al (2011): Twitter mood predicts the stock market • Bollen et al (2011): Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena • Go et al (2009): Twitter sentiment classification using distant supervision • Jiang et al (2011): Target-dependent Twitter Sentiment Classification