1 / 36

Time series Word frequencies Sentiment analysis

Learn how to gather Twitter data, analyze time series trends, perform word frequency analysis, and detect sentiment using tools like Mozdeh and SentiStrength.

Download Presentation

Time series Word frequencies Sentiment analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Twitter Analysis for the Social Sciences and Humanities Time series Word frequencies Sentiment analysis

  2. Twitter Data: Quick and Slow 1. Quick: http://topsy.com search, then click Tweets per day including ‘Nobel’ 2. Slow: Download Mozdeh http://mozdeh.wlv.ac.uk, collect data in advance then analyse Tweets per hour including ‘Nobel’

  3. Gathering Twitter data • Tweets can be obtained via Mozdeh • Also: Chorus http://www.chorusanalytics.co.uk/ NodeXL and other software • Must be gathered in near to real time • Not available after about 2 weeks • Gather using keyword or phrase searches • If historical tweets needed, buy from a data provider

  4. Twitter Time Series Analysis with Mozdeh: Step by Step Download free from http://mozdeh.wlv.ac.uk for Windows

  5. Search engine part of Mozdeh – for the saved tweets

  6. Sentiment-based searches

  7. Gender-based searches

  8. Identifies words that are Relatively frequent for the query

  9. Co-word analysis • Am automatic method to compare different subsets of tweets for key differences • Identifies words that are relatively frequent compared to one keyword (or gender) compared to another

  10. The above example will compare the words in tweets containing book and read by gender

  11. Words more common in male tweets containing “book” than in female tweets containing “book”

  12. Words more common in ? tweets than in ? Tweets (male or female)

  13. A statistical measure of the significance of the difference between genders Words more common in ? tweets than in ? Tweets (male or female)

  14. Twitter research ideas • Monitor keyword searches for a topic • Analysis ideas • Time series – identify trends & causes of any peaks • Content analysis of random sample tweets – why is the topic tweeted? • Gender/sentiment/high frequency keywords • Network analysis of tweeters/tweeting & qualitative analysis of key tweeters – who is tweeting and who is successful?

  15. How and why scholars cite on Twitter – Priem & Costello • Example of a Twitter study using interviews and content analysis of tweets • Gathered and analysed a sample of tweets sent by scholars • Concluded that: “Twitter citations are also uniquely conversational, reflecting a broader discussion crossing traditional disciplinary boundaries.”

  16. High frequency word analysis • An analysis of high frequency words for Nobel prizes found tweets about: • Alternative winners’ names non-academic prizes only • Gender referencesFemale winner only (9% mentioned her gender). • Expressions of Sentimentliterature prize only (42% positive) • A quick analysis can give new insights.

  17. Sentiment Strength Detection in the Social Web -SentiStrength • Detect positive and negative sentiment strength in short informal text • Develop workarounds for lack of standard grammar and spelling • Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!) • Classify simultaneously as positive 1-5 AND negative 1-5 sentiment Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology, 63(1), 163-173. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.

  18. SentiStrength Algorithm - Core • List of 2,489 positive and negative sentiment term stems and strengths (1 to 5), e.g. • ache = -2, dislike = -3, hate=-4, excruciating -5 • encourage = 2, coolest = 3, lover = 4 • Sentiment strength is highest in sentence; or highest sentence if multiple sentences

  19. positive, negative -2 1, -2 • My legs ache. • You are the coolest. • I hate Paul but encourage him. 3 3, -1 2, -4 -4 2

  20. Extra sentiment methods • spelling correctionnicce -> nice • booster words alter strength very happy • negating words flip emotions not nice • repeated letters boost sentiment/+ve niiiice • emoticon list :) =+2 • exclamation marks count as +2 unless –ve hi! • repeated punctuation boosts sentiment good!!! • negative emotion ignored in questions u h8 me? • Sentiment idiom list shock horror = -2 Online as http://sentistrength.wlv.ac.uk/

  21. Tests against human coders SentiStrength agrees with humans as much as they agree with each other 1 is perfect agreement, 0 is random agreement

  22. Why the bad results for BBC? (and Digg) • Irony, sarcasm and expressive language e.g., • David Cameron must be very happy that I have lost my job. • It is really interesting that David Cameron and most of his ministers are millionaires. • Your argument is a joke. $

  23. Example – sentiment in major media events • Analysis of a corpus of 1 month of English Twitter posts (35 Million, from 2.7M accounts) • Automatic detection of spikes (events) • Assessment of whether sentiment changes during major media events

  24. Automatically-identified Twitter spikes Proportion of tweets mentioning keyword 9 Mar 2010 9 Feb 2010 Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events.  Journal of the American Society for Information Science and Technology, 62(2), 406-418.

  25. Chile Proportion of tweets mentioning Chile matching posts Date and time 9 Mar 2010 9 Feb 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Increase in –ve sentiment strength Sentiment strength Subj. 9 Feb 2010 Date and time 9 Mar 2010

  26. #oscars Proportion of tweets mentioning the Oscars % matching posts Date and time 9 Mar 2010 9 Feb 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Increase in –ve sentiment strength Sentiment strength Subj. 9 Feb 2010 Date and time 9 Mar 2010

  27. Sentiment and spikes • Statistical analysis of top 30 events: • Strong evidence that higher volume hours have stronger negative sentiment than lower volume hours • No evidence that higher volume hours have different positive sentiment strength than lower volume hours => Spikes are typified by small increases in negativity

  28. Summary • Tweets gathered free with Mozdeh • Search tweets and conduct content analysis • Time series analysis graphs – trends over time • Identify topics causing high sentiment • Identify differences by gender • Identify important topics by high freq. keywords • Identify important topic differences by co-words • Pilot test your ideas first • Gather tweets in advance for a period of time • Can give quick insights into your research goals – or can be a primary research method

More Related