1 / 44

Sentiment Analysis & Opinion Mining

Sentiment Analysis & Opinion Mining. Lecture Two: March 3, 2011. Aditya M Joshi M Tech3, CSE IIT Bombay {adityaj@cse.iitb.ac.in}. Sentiment analysis (SA). Task of tagging text with orientation of opinion This is a good movie. This is a bad movie. The movie is set in Australia.

daisy
Download Presentation

Sentiment Analysis & Opinion Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sentiment Analysis & Opinion Mining Lecture Two: March 3, 2011 Aditya M Joshi M Tech3, CSE IIT Bombay {adityaj@cse.iitb.ac.in}

  2. Sentiment analysis (SA) Task of tagging text with orientation of opinion This is a good movie. This is a bad movie. The movie is set in Australia. RECAP Subjective Objective

  3. Challenges of SA • Domain dependent • Sarcasm • Thwarted expressions • Negation • Implicit polarity • Time-bounded Sarcasm uses words of a polarity to represent another polarity. Example: “The perfume is so amazing that I suggest you wear it with your windows shut” Sentiment of a word is w.r.t. the domain. Example: ‘unpredictable’ For steering of a car, For movie review, “I did not like the movie.” “Not only is the movie boring, it is also the biggest waste of producer’s money.” “Not withstanding the pressure of the public, let me admit that I have loved the movie.” “The camera of the mobile phone is less than one mega-pixel – quite uncommon for a phone of today.” “This phone allows me to send SMS.” “This phone has a touch-screen.” the sentences/words that contradict the overall sentiment of the set are in majority Example: “The actors are good, the music is brilliant and appealing. Yet, the movie fails to strike a chord.” RECAP

  4. How much opinion? Chart created using : www.technorati.com/chart/ RECAP

  5. Using ML for NLP • Documents represented as feature vectors for classifiers • Features: unigrams, etc. • Models: SVM, NB, etc. The: 2 movie: 2 is: 2 set: 1 in: 1 Australia: 1 good: 1 The movie is set in Australia. The movie is good. Chart created using : www.technorati.com/chart/ RECAP

  6. Support vector machines • Basic idea Margin “Maximum separating-margin classifier” Support vectors Separating hyperplane RECAP

  7. Results Compared to list-based classifiers (58-69%) RECAP

  8. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications

  9. Resources for SA SentiWordNet • WordNet synsets marked with three types of scores: positive, negative, objective I am feeling happy. I am feeling happy.

  10. Seed-set expansion in SWN also-see Seed words antonymy Ln Lp The sets at the end of kth step are called Tr(k,p) and Tr(k,n) Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)

  11. Building SentiWordnet • Classifier alternatives used: Rocchio (BowPackage) & SVM(LibSVM) • Different training data based on expansion • POS –NOPOS and NEG-NONEG classification • Total eight classifiers • For different combinations of k and classifiers • Synsets not in the expanded seed set are used as test synsets • Score is average of scores returned by the classifiers

  12. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications

  13. Subjectivity detection • Aim: To extract subjective portions of text • Algorithm used: Minimum cut algorithm

  14. Constructing the graph • Why graphs? • Nodes and edges? • Individual Scores • Association scores • To model item-specific • and pairwise information • independently. Nodes: Sentences of the document and source & sink Source & sink represent the two classes of sentences Edges: Weighted with either of the two scores Prediction whether the sentence is subjective or not Indsub(si)= Prediction whether two sentences should have the same subjectivity level • T : Threshold – maximum distance upto • which sentences may be considered proximal • f: The decaying function • i, j : Position numbers

  15. Constructing the graph • Build an undirected graph G with vertices {v1, v2…,s, t} (sentences and s,t) • Add edges (s, vi) each with weight ind1(xi) • Add edges (t, vi) each with weight ind2(xi) • Add edges (vi, vk) with weight assoc (vi, vk) • Partition cost:

  16. Example Sample cuts:

  17. Results (1/2) • Naïve Bayes, no extraction : 82.8% • Naïve Bayes, subjective extraction : 86.4% • Naïve Bayes, ‘flipped experiment’ : 71 % Subjectivity detector POLARITY CLASSIFIER Subjective Document Document Objective

  18. Results (2/2)

  19. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications

  20. Adjectives for SA • Many adjectives have high sentiment value • A ‘beautiful’ bag • A ‘wooden’ bench • An ‘embarrassing’ performance • A ‘nice wooden’ bench • A ‘wooden nice’ bench • An idea would be to augment this polarity information to adjectives in the WordNet

  21. Setup • Two anchor words (extremes of the polarity spectrum) were chosen • PMI of adjectives with respect to these adjectives is calculated Polarity Score (W)= PMI(W,excellent) – PMI (W, poor) word PMI PMI excellent poor

  22. Experimentation • K-means clustering algorithm used on the basis of polarity scores • The clusters contain words with similar polarities • These words can be linked using an ‘isopolarity link’ in WordNet

  23. Results • Three clusters seen • Major words were with negative polarity scores • The obscure words were removed by selecting adjectives with familiarity count of 3 • the ones that are not very common • Also reports an improvement when scores are used as feature values

  24. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications

  25. Subject-based SA The horse bolted. The movie lacks a good story.

  26. Lexicon subj. bolt Argument that receives the sentiment (subj./obj.) b VB bolt subj subj. lack obj. b VB lack obj ~subj Argument that receives the sentiment (subj./obj.) Argument that sends the sentiment (subj./obj.)

  27. Lexicon • Also allows ‘\S+’ characters • Similar to regular expressions • E.g. to put \S+ to risk • The favorability of the subject depends on the favorability of ‘\S+’.

  28. Example The movie lacks a good story. The movie lacks \S+. Lexicon : • Steps : • Consider a context window of upto five words • Shallow parse the sentence • Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step G JJ good obj. B VB lack obj ~subj.

  29. Results

  30. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

  31. Cross-lingual SA • Multilingual content on the internet growing • How can the sentiment it carries be identified? • Can we take help of the ‘rich cousin’ English? Sentiment Analysis System Sentiment Analysis System Hindi document English document Sentiment Label

  32. Alternatives to Cross-lingual SA Strategies for SA for target language Use corpus in target language Translate to a ‘rich’ source language Develop resources for target language

  33. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

  34. Domain-dependence of words • ‘deadly’ • It was one deadly match! • There are some deadly poisonous snakes in the jungles of Amazon.

  35. General Approach • Retain the ‘common-to-all-domain’ words • Learn only the ‘special domain’ words • Domain differences can be substantial

  36. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

  37. Opinion spam: A side-effect of UGC Low quality reviews, review spam / opinion Spam. Positive opinion -> Financial gain for organization • Reviews contain rich user opinions on products and services • Anyone can write anything on the Web • No quality control • Result • Incentives

  38. Different types of spam reviews Reference : [Jindalet al, 2008] • Giving undeserving reviews to some • target objects in order • to promote/demote the object • hyper spam - undeserving positive reviews • defaming spam - malicious negative reviews • DUPLICATES • No comment on the product • Comments on brands, manufacturer or • sellers of the product • Advertisements • Other irrelevant reviews containing no opinions • e.g. questions, answers and random text Although you should not expect prompt shippin. (It took 3 weeks and several e-mails before I received my order.) I would order again from this merchant, just because the price was right - http://www.pricegrabber.com It’s from nikon, what more you want.. • Type 1 (untruthful opinions) • Type 2 (reviews on brands only) • Type 3 (non-reviews)

  39. Outline Lecture 1 Lecture 2 Motivation & Introduction Approaches to SA Challenges of SA: Why SA is non-trivial Variants of SA: What forms does it exist in? Opinion on the web: Is doing SA really worth it? Resources for SA: SentiWordNet Subjectivity detection: Separating the opinion from facts Adjectives for SA: Adjectives are great! Subject-based SA: Who defeated whom? Classifiers for SA Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Applications Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

  40. Challenges with tweets • Ill-formed • Spelling mistakes • Informal words/emoticons • Extensions of words (‘happppyyyyy’) • Vague topics www.clia.iitb.ac.in:8080/TwitterApp/index.jap

  41. Mood analysis • Real-time updation of moods w. r. t. a topic SOME ACTUAL APPLICATIONS Snapshot: MoodViews

  42. Semantic search • Sentiment search API by Evri • Claims to allow deeper answers like “who”, “why”

  43. A zeitgeist • Understanding the ‘climate’ Snapshot: Twitscoop

  44. … and many more

More Related