160 likes | 174 Views
This research paper explores methods and approaches for incorporating negation in sentiment analysis. It evaluates different negation scoping methods and their impact on accuracy. The results show that using a fixed window length of 2 words following a negation keyword yields the highest increase in accuracy compared to not accounting for negation.
E N D
Determining Negation Scope and Strengthin Sentiment Analysis Alexander Hogenboom Erasmus School of Economics Erasmus University Rotterdam hogenboom@ese.eur.nl SMC 2011
Outline • Introduction • Sentiment Analysis • Accounting for Negation • Framework • Evaluation • Conclusions • Future Work SMC 2011
Introduction (1) • Need for information monitoring tools for tracking sentiment in today’s complex systems • The Web offers an overwhelming amount of textual data, containing traces of sentiment SMC 2011
Introduction (2) • Existing sentiment analysis approaches are based on word frequencies • There is a tendency of involving various other aspects of content in automated sentiment analysis • Accounting for negation seems promising, but how to model the influence of negation keywords on the conveyed sentiment? SMC 2011
Sentiment Analysis 5 • Sentiment analysis is typically focused on determining the polarity of natural language texts • Applications in summarizing reviews, determining a general mood (consumer confidence, politics) • Common approach to sentiment analysis: • Creation of lexicon (list of words and their sentiment scores) • Utilization of lexicon to determine sentiment in text • Sentiment analysis approaches differ on several distinguishing characteristic features, e.g., • Analysis level and focus • Handling of syntactic variations, amplification, and negation SMC 2011
Accounting for Negation (1) 6 Common approach: exploitation of negation keywords Challenge lies in finding the negation scope Sophisticated approaches involve complex rules, compositional semantics, or machine learning Many existing sentiment analysis frameworks use rather simple conceptualizations of negation scope SMC 2011
Accounting for Negation (2) 7 • Let us consider the following positive sentence: • Example: Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! • Rest of Sentence (RoS): • Following: Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! • Around: Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! • First Sentiment-Carrying Word (FSW): • Following: Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! • Around: Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! SMC 2011
Accounting for Negation (3) 8 • Let us consider the following positive sentence: • Example: Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! • Next Non-Adverb (NNA): • Following: Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! • Fixed Window Length (FWL): • Following (3): Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! • Around (3): Luckily, the smellypoo did not leave awfullynastystains on my favorite shoes! SMC 2011
Framework (1) 9 Lexicon-based sentence-level sentiment scoring by using SentiWordNet Optional support for sentiment negation Individual words are scored in the range [-1,1] Word scores are used to classify a sentence as positive (1) or negative (-1) SMC 2011
Framework (2) 10 • Score sentences in test corpus for their sentiment • For an arbitrary sentence: • Retrieve all words (simple and compound) • Retrieve each words’ Part-Of-Speech (POS) and lemma • Disambiguate word senses (Lesk algorithm) • Retrieve words’ sentiment scores from lexicon • Negate sentiment scores of negated words, as determined by means of one of the considered approaches, by multiplying the scores with an inversion factor (typically negative) • Calculate sentence score as sum of words’ scores • Classify sentence as either positive (score ≥ 0) or negative (score < 0) SMC 2011
Evaluation (1) 11 Implementation in C#, Microsoft SQL Server database, SharpNLP-based POS tagger, WordNet.Net API for lemmatization and word sense disambiguation, SentiWordNet sentiment lexicon Corpus of 930 positive and 1,355 negative manually classified English movie review sentences (60% training set, 40% test set) SMC 2011
Evaluation (2) 12 Baseline: sentiment without accounting for negation Alternatives: negation scoping with RoS, FSW, NNA, and FWL (window sizes ranging from 1 to 4) Optimized inversion factor for best alternative to a value in the range [-2, 0] (hill-climbing on training set) SMC 2011
Evaluation (3) 13 SMC 2011
Conclusions • Recent sentiment analysis methods consider more and more aspects of content other than word frequencies • Our corpus-based evaluation of several common negation scoping methods shows that only some perform significantly better than our baseline of not accounting for negation • FWL with a window of 2 words following a negation keyword yields the highest increase in accuracy (5.5%) and macro-level F1 (6.2%) compared to the baseline • An optimized inversion factor of -1.27 rather than -1 yields an accuracy increase of 7.0% and a macro-level F1 increase of 8.0% compared to the baseline SMC 2011
Future Work • Let the negation scope detection method depend on the position of a negation keyword • Deeper understanding of semantics in order to cope with, e.g., context-dependent interpretations • Distinct sentiment inversion factors for negated positive and negative words SMC 2011
Questions? Alexander HogenboomErasmus School of EconomicsErasmus University RotterdamP.O. Box 1738, NL-3000 DRRotterdam, the Netherlands hogenboom@ese.eur.nl SMC 2011