210 likes | 586 Views
Sentiment Analysis. An Overview of Concepts and Selected Techniques. Terms. Sentiment A thought, view, or attitude, especially one based mainly on emotion instead of reason Sentiment Analysis opinion mining
E N D
Sentiment Analysis An Overview of Concepts and Selected Techniques
Terms • Sentiment • A thought, view, or attitude, especially one based mainly on emotion instead of reason • Sentiment Analysis • opinion mining • use of natural language processing (NLP) and computational techniques to automate the extraction or classification of sentiment from typically unstructured text
Motivation • Consumer information • Product reviews • Marketing • Consumer attitudes • Trends • Politics • Politicians want to know voters’ views • Voters want to know policitians’ stances and who else supports them • Social • Find like-minded individuals or communities • Webpage
Problem • How to interpret features for sentiment detection? • Bag of words (IR) • Annotated lexicons (WordNet, SentiWordNet) • Syntactic patterns • Which features to use? • Words (unigrams) • Phrases/n-grams • Sentences
Challenges • Must consider other features due to… • Subtlety of sentiment expression • irony • Domain/context dependence • words/phrases can mean different things in different contexts and domains
Approaches • Machine learning • Naïve Bayes • Maximum Entropy Classifier • SVM • Markov Blanket Classifier • Unsupervised methods • Use lexicons Assume pairwise independent features
Three levels of meaning • Lexical Semanics • The meanings of individual words • Sententical / Composional / FormalSemantics • How those meanings combine to make meanings forindividual sentences • Discourse or Pragmatics • How those meanings combine with each other and withother facts about various kinds of context to makemeanings for a text or discourse(+ Dialog or ConversationalSemantics)
Wordnet[1][2] • The research efforts of the Department of Linguistics and Psychology at Princeton University for better understanding of English language and semantics resulted. • WordNet is available as a database, searchable via web interface or via a variety of software APIs, providing acomprehensive database of over 150,000 unique terms organised into more than 117,000 different meanings (WORDNET, 2006). • WordNet also grew with extensions of its structure applied to a number of other languages (WORDNET, 2009).
WordNet • A hierarchically organized lexical database • On‐line thesaurus + aspects of a dictionary • Versions for other languages are under development Category -----UniqueForms Noun ------> 117,097 Verb ------> 11,488 Adjective ------> 22,141 Adverb ------> 4,601
How is “sense” defined inWordNet? • The set of near‐synonyms for a WordNet sense is called a synset(synonym set); it’s their version of a sense or a concept • Example: chump as a noun to mean‘a person who is gullible and easy to take advantage of’ • Each of these senses share this same gloss • Thus for WordNet, the meaning of this sense of chump is this list.
SentiWordNet [3] • Based on WordNet “synsets” • http://wordnet.princeton.edu/ • SentiWordNet is sentiment analysis lexical resource made up of synset from WordNet, athesaurus-like resource; they are allocated a sentiment score of positive, negative or objective. • These scores are automatically generated using the semi-supervised method • Each term in WordNet database is assigned a score of 0 to 1 in SentiWordNet which indicates its polarity. • Strong partiality information terms are assigned with higher scores whereas less bias/subjective terms carry low scores.
Values in 3 dimension sum to 1. Ex: P=0.75, N=0, O=0.25
Demo Explore the sentiment lexicons discussed here: • http://sentiment.christopherpotts.net/lexicon/ • Our Demo: • http://www.tripadvisor.com/Hotel_Review-g187147-d290407-Reviews-Paris_France_Hotel-Paris_Ile_de_France.html • Tutorial page: http://sentiment.christopherpotts.net/lexicons.html#building
Polarity classification or semantic orientation determination of sentiment expressingphrases • a positive sentiment, a negative sentiment • Intensity or strength determination of sentiment expressing phrases • the word excellentis a strong positive word whereas the wordgood is a weak positive word • Product feature extraction • for example battery life, image quality and resolution in acamera domain and seating comfort, maximum speed, wheels and steering in a car domain. • Opinion and sentiment expressing phrase extraction • for example extremely comfortable,not smooth, quite heavy, good and bad
References • http://www.answers.com/sentiment, 9/22/08 • B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proc Conf on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, 2002. • Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf on Language Resources and Evaluation, 2006. • Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining. TREC 2006 Blog Track, Opinion Retrieval Task. • Devitt A, Ahmad K. Sentiment Polarity Identification in Financial News: A Cohesion-based Approach. ACL 2007. • Bo Pang , Lillian Lee, A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.271-es, July 21-26, 2004.