300 likes | 307 Views
“AL EXANDRU I OAN CUZA” UNIVERSIT ATY OF IAŞI FACULT Y OF COMPUTER SCIENCE. The Semantics and Pragmatics of Natural Language Daniela G ÎFU d aniela.gifu @info.uaic.ro /. SENTIMENT ANALYSIS – AN OVERVIEW. What is Sentiment Analysis?. IMPACT OF TOPIC.
E N D
“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU daniela.gifu@info.uaic.ro /
SENTIMENT ANALYSIS – AN OVERVIEW
IMPACT OF TOPIC • Sentiment Analysis (SA) -one of the most current topics in NLP. • SA - offers possibility to monitor, to identify and understand in real time consumer's feelings and attitudes towards brands or topics in cyberspace and act accordingly. • SA - very popular in social media. • Target: academia and industry.
IMPACT IN SOCIAL MEDIA • Social media deals with the personal and social related opinion. • SA - very vital role in understanding the opinions from such conversation, posts, blogs, etc and deriving a sensible short summary consisting of most relevant opinions.SA - helps to: • Take quick decision • To change strategy and tactics used • To understand mood of the market • Be with the changing trends • To improve one’s product
VALIDATY OF S.A. - evaluated by comparing sentiment scores for specific comments to their respective star ratings, which are common clues used by individuals to filter what they read during information acquisition.
RESEARCH QUESTIONS... • How comparable are sentiment scores for reviews/comments to their respective star ratings? • How do sentiment scores impact decision outcomes?
PURPOSE AND MOTIVATION • to create a complete SOTA in SA, with a focus on social media posts. • to enhance the results of context-based SA. • to clarify the descriptive behavior of receptor, affected by the multitude of information on forums. • to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).
CONTENT 1. Introduction 2. A general view on the subject 3. SA levels 3.1. SA at document level 3.2. SA at clause/sentence level 3.3. Features-based on SA 3.4. Comparative sentiment analysis 3.5. Sentiment lexicon acquisition 3.6. Conclusions 4. Applications 4.1. Business and government 4.2. Review sites 4.3. Other domains: politics and sociology 4.4. Conclusions 5. Conclusions and discussions
2. A general view on the subject SA - the process of detecting the contextual polarity of text. SA – terminology: - subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986]; - analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000]; - affect [Batson, Shaw, and Oleson 1992]; - point of view [Wiebe 1994; Scheibman 2002]; - evaluation [Hunston and Thompson, 2001] - appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].
3. Sentiment classification techniques Fig. 1Sentiment classification techniques
Positive Negative Neutral 3. SA levels - document a) supervised approach Fig. 2 Supervised learning – for three classes
3. SA levels - document a) supervised approach Fig. 2 Python NLTK Demos for Natural Language Text Processing http://text-processing.com/demo/
3. SA levels - document a) unsupervised approach Based on determining the semantic orientation (SO) of specific words/phrases. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011] Set of predefined POS models – [Turney, 2002]
3. SA levels – clause/sentence • More complex – identifying if a sentence is opinionated and establishing the nature of opinion; • using supervised methods; • 1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003] • 2. an approach based on minimal reductions. [Pang and Lee, 2004] • The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?
3. SA levels – features • more entities for each analyzed text or more attributes for each entity; • extraction of the attributes of an object; • Becali a ajutat mult săracii1/, [dar] nimeni nu a ştiut exact2/[cum]a făcut atâţia bani3/. • extract and store all NPs; • keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]
3. SA levels – comparative • When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006] • Dacia Logan arată mult mai bine decât Dacia Solenza. • adverbial adjectives:mai mult, mai puţin(En. - more, less) • superlative adjectives and adverbs:mai, cel puţin (En. - more, at least) • additional clauses:decât, împotriva (En. - rather than, against). • cover 98% of the comparative opinions
3. SA levels – sentiment lexicon manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004] Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes. <classes> <class name="emotional" id="1"/> <class name="positive" id="2" parent="1"/> <class name="negative" id="3" parent="1"/> <class name="anxiety" id="4" parent="3"/> <class name="anger" id="5" parent="3"/> <class name="sadness" id="6" parent="3"/> <class name="spectacular" id="7" parent="2"/> <class name="firmness” id="8" parent="2"/> <class name="moderation" id="9" parent="2"/> </classes>
3. SA levels – sentiment lexicon Our software performs part-of-speech (POS) tagging and lemmatization of words. For example: <lexic name="Politic" lang="ro"> <word lemma="clevetitor" classes="1,3,6"/> <word lemma="genial" classes="1,2,7"/> … </lexic>
3. SA levels – sentiment lexicon • corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain. • a classical work [Hatzivassiloglou and McKeown, 1997] using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either). • Examples: • bărbat puternicşiarmonios / bărbat puternicşiarmonios • femeie senzualăsau inteligentă? / femeie sărmanăsau înstărită? • băiatul nu e nici prost,nici deștept.../ băiatul nu e nici prost,nici urât...
4. Applications – business and government • “Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes.[Lee, 2004] • Two kinds of answers: • - the subjective reasons about intangible qualities (e.g. thephysical keyboard is tacky) • or • misperceptions (even though they are wrong) • Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].
4. Applications – business and government • Solution based on a dictionary + semantic role of negations and pragmatic connectors: • classification of emotionally charged words into two classes: positive and negative (also a neutral class); • more classes, associating to each word with a value in the range -5 to +5; • [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3; • [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.
4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT> <P ID="1"> <S ID="1"> <W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative" Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative" offset="0"></W> <NP HEADID="11.2" ID="0" ref="0"> <W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr" Number="singular" POS="PRONOUN" Person="third" Type="negative" offset="1">Nimic</W> <W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W> <W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W> <W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W> • <W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W> • <W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W> • <W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine" • ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" • offset="29">decât</W> • </NP> • <NP HEADID="11.9" ID="1" ref="1"> • <W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof" • MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W> • <NP HEADID="11.10" ID="2" ref="2"> • <W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport" • MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W> • <W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W> • <NP HEADID="11.12" ID="3" re f="3"> • <W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12" • LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common" • offset="53">platformă</W> • </NP> • </NP> • </NP> • </DOCUMENT>
4. Process phases: POS-tagger & NER & Anaphora Resolution Fig. 3 The interface of the EAT system
4. Applications – business and government • 46 rules for values. • <rule> • <word attribute=”LEMMA” value=”cel”/> • <word attribute=”LEMMA” value=”mai”/> • <word attribute=”POS“ value=”ADJECTIVE”/> • </rule> • Ex: cel mai bun • <rule> • <word attribute=”LEMMA” value=”cel”/> • <word attribute=”LEMMA” value=”mai”/> • <word attribute=”POS” value=”bun”/> • </rule>
4. Applications – review sites • to appreciate the reviews and ratings about your company or yourself; • to summarize reviews. • Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013] • 6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter. • - we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.
4. Applications – politics/sociology Two dimensions in politics: 1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008]; 2. to clarify the politicians’positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b] In sociology: - how ideas and innovations are propagated [Rosen, 1974] Ex: the polls on different issues
CONCLUSIONS AND DISCUSSIONS SA - a complex task; SA - an emerging discipline with promising academic and, most important, industrial applications; .... the sentiment classification problem - more challenging Future work... - to develop an independent sentiment classifier using machine learning methods; - to compare the results obtained with machine learning to sentiment classification on traditional topic-based categorization; - to analyse the sentiment lexicon in old Romanian language in terms of diachronic semantics.