210 likes | 408 Views
Sentiment Detection. Rik Sarkar (03305048) Kedar Godbole (03305805). Outline. Sentiment detection: the problem statement Difficulties in sentiment detection Approaches to sentiment detection Conclusion Project proposal. Problem Statement.
E N D
Sentiment Detection Rik Sarkar (03305048) Kedar Godbole (03305805)
Outline • Sentiment detection: the problem statement • Difficulties in sentiment detection • Approaches to sentiment detection • Conclusion • Project proposal
Problem Statement • Detect the polarity about a particular topic in a document Polarity: - Positive - Negative - Mixed - Neutral
Motivation Reviews on the Web • Opinions about a product • Opinions about the individual aspects of a product • Movie/book reviews • Feedback/evaluation forms
Issues • Reference to multiple objects in the same document - The NR70 is trendy. T-Series is fast becoming obsolete. • Dependence on the context of the document - “Unpredictable” plot ; “Unpredictable” performance • Negations have to be captured - Monochrome display is not what the user wants
Issues (contd.) • Metaphors/Similes - The metallic body is solid as a rock • Part-of and Attribute-of relationships - The small keypad is inconvenient • Absence of a polar word - How can someone sit through this seminar?
Approaches to Sentiment Detection • Based on pre-selected sets of words • Naive Bayes • Support Vector Machines • Unsupervised learning • Enhancement by NLP
An Unsupervised Learning Technique Extract phrases from the review based on patterns of POS tags • JJ – Adjective • RB – Adverb • NN – Noun
Unsupervised Learning PointWise Mutual Information (PMI) and Semantic Orientation (SO) PMI(word1, word2) = SO (phrase) = PMI (phrase, ”excellent”) – PMI (phrase, “poor”)
Unsupervised Learning • Determine the Semantic Orientation (SO) of the phrases • Search on AltaVista • SO (phrase) =
Unsupervised Learning Calculate average semantic orientation of document: Average Semantic Orientation = 0.524
Need for NLP • Identifying phrases is not enough – need to identify subject/object - The NR70 is trendy. T-Series is fast becoming obsolete. • Need to identify part-of and attribute-of relationship - The battery is long-lasting
Focus of the sentiment Feature/attribute terms: • BNP - Base Noun Phrases - battery, display, keypad • dBNP - Definite Base Noun Phrases - “the display” • bBNP - Beginning Definite Base Noun Phrases - “The battery is long-lasting”
Sentiment Analyzer • Sentiment lexicon database - <lexical_entry> <POS> <sent_category> - “excellent” JJ + • Sentiment pattern database - <predicate> <sent_category> <target> - “I am impressed with the flash capabilities” - impress + PP(by;with) target
SA (contd.) • Identify sentences containing feature terms • Ternary expressions (T-expressions) - +ve/-ve sentiment verbs <target, verb, “”> - trans verbs <target, verb, source> • Binary expressions (B-expressions) - <adjective, target>
SA (contd.) • Identify sentiment phrases within subject, object phrases • Associating sentiment with the target - Based on sentiment patterns “I was impressed by the flash capabilities” “This camera takes excellent pictures” - Based on B-expressions “Poor performance in a dark room”
Other issues • Position of the sentiment words - Words at the beginning and end of a review • Sentiment about the characters in the movie versus Sentiment about the actors in the movie – abstraction. “He played the role of a very corrupt politician” “He played the role brilliantly”
Conclusion • Sentiment detection can be used in areas ranging from marketing research to movie reviews. • Sentiment Detection is a “hard” problem due to context-sensitivity, complex sentences, etc. • Statistical methods should be augmented with NLP techniques.
References • Yi, Nasukawa, et al. Sentiment Analyzer: Extracting Sentiments about a Given Topic using NLP techniques. Proceedings of the Third IEEE International Conference on Data Mining, p. 427, Nov 19-22, 2003 • Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. Proceedings of the 40th Annual Meeting of ACL, p. 417-424, 2002 • Matthew Hurst and Kamal Nigam. Retrieving Topical Sentiments from Online Document Collections. Document Recognition and Retrieval XI, p. 27-34, 2004
References (contd.) • B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using Machine Learning techniques. Proceedings of the 2002 ACL EMNLP Conference, p. 79-86, 2002
Project • Sentiment analyzer for a specific domain • Given set of features, initial list of polar words • Learns new polar words from documents analyzed