790 likes | 1.37k Views
Università di Pisa. Aspect Based Sentiment Analysis. Human Language Technologies Giuseppe Attardi. Aspect Terms. The lasagna was great, but we had to wait 20 minutes just to be seated. Aspect term: lasagna (positive sentiment). Aspect Categories.
E N D
Università di Pisa Aspect Based Sentiment Analysis Human Language Technologies Giuseppe Attardi
Aspect Terms The lasagna was great, but we had to wait 20 minutes just to be seated. Aspect term: lasagna (positive sentiment)
Aspect Categories The lasagna was great, but we had to wait 20 minutes just to be seated. Aspect categories: food (positive sentiment), service (negative sentiment)
SemEval 2014 • SemEval-2014 Task 4 (Aspect Based Sentiment Analysis) • Given a customer review, automatic systems are to determine aspect terms, aspect categories, and sentiment towards these aspect terms and categories • Training data • Two domain-specific datasets for laptops and restaurants, consisting of over 6K sentences with fine-grained aspect-level human annotations.
Subtasks • Aspect Term Extraction • Aspect Term Polarity • Aspect Category Detection • customer reviews provided for two domains: restaurants and laptops • five aspect categories are defined for the restaurant domain: food, service, price, ambience, and anecdotes. • no aspect category for the laptop reviews • Aspect Category Polarity • Automatic systems are to determine if any of those aspect categories are described in a review
NRC-Canada • NRC-Canada-2014: DetectingAspects and Sentiment in CustomerReviews, SvetlanaKiritchenko, XiaodanZhu, Colin Cherry, and Saif M. Mohammad. In Proceedings of the eighthinternational workshop on Semantic Evaluation Exercises (SemEval-2014), August 2014, Dublin, Ireland. • Builds on the NRC-Canada sentiment analysis system which determines the overall sentiment of a message (top results in SemEval-2013 Task 2 and SemEval-2014 Task 9 on Sentiment Analysis of Tweets)
Resources In-domain corpora • 180,000 Yelp restaurant reviews (Phoenix Academic dataset) • 125,000 Amazon laptop reviews (McAuley & Leskovec, 2013) Sentiment lexicons - terms and degree of their association with positive or negative sentiment Word-Aspect Association lexicon - terms and degree of their association with the aspect categories
Corpora • The Yelp Phoenix Academic Dataset contains customer reviews posted on the Yelp website. The businesses for which the reviews are posted are classified into over 500 categories • Amazon laptop reviews corpus: McAuley and Leskovec (2013) collected reviews posted on Amazon.com.A subset of 124,712 reviews that mention either laptop or notebook • 5 star ratings (1-2 negative, 4-5 positive)
Word-Aspect Corpora Generation • star ratings in reviews are used as weak labels • score(w) = PMI(w, positive) PMI(w, negative) • if score(w) > 0, then word w is positive • if score(w) < 0, then word w is negative • affirmative and negated contexts are treated separately The desserts are very overpriced and not very tasty Sentiment of review Affirmative context Negated context Negated Context Corpus Affirmative Context Corpus
Pointwise Mutual Information • freq(w, pos) is the number of times a term w occurs in positive reviews • freq(w) is the total frequency of term w in the corpus • freq(pos) is the total number of tokens in positive reviews • N is the total number of tokens in the corpus • PMI(w, neg) calculated in a similar way • ignored terms that occurred less than five times in each (positive and negative) groups of reviews
Negated Contexts • Negated contexts (defined as text spans between a negation word and a punctuation mark) and affirmative (non-negated) contexts • sentiment scores were then calculated separately for the two types of contexts • the term good in affirmative contexts has a sentiment score of 1.2 whereas the same term in negated contexts has a score of -1.4
Out Domain Sentiment Lexica • Large-coverage automatic tweet sentiment lexicons: • Hashtag Sentiment lexicons • Sentiment140 lexicons (Kiritchenko et al., 2014) • Three manually created sentiment lexicons • NRC EmotionLexicon (Mohammad and Turney, 2010) • Bing Liu’sLexicon (Hu and Liu, 2004) • MPQA SubjectivityLexicon (Wilson et al., 2005)
Yelp Restaurant Word–Aspect Association Lexicon • Each sentence of the corpus was labeled with zero, one, or more of the five aspect categories by our aspect category classification system • For each term w and each category c an association score was calculated: score(w, c) = PMI(w, c) − PMI(w, ¬c)
Subtask 1: Aspect Term Extraction Task: to detect aspect terms in a sentence Approach: • Semi-Markov discriminative tagger, trained with MIRA (perceptron) • tags phrases, not tokens, can use phrase-level features Features: • emission features: token identity (cased, lowercased) in a 2-word window, prefixes and suffixes up to 3 chars, phrase identity (cased, lowercased) • transition features: tag ngrams Results:
Tagger Features • token feature templates for wi • token-identity within a window (wi−2 ...wi+2) • lower-cased token-identity within a window (lc(wi−2)...lc(wi+2)) • prefixes and suffixes of wi (up to 3 characters in length) • phrase-level emission feature templates: • the cased and uncased identity of the entire phrase covered by a tag, which allow the system to memorize complete terms such as, “getting a table” or “fish and chips.” • Transition features templates are short n-grams of tag identities: yj; yj,yj−1; and yj,yj−1,yj−2.
Subtask 2: Aspect Term Polarity Task: to detect sentiment towards a givenaspectterm Approach: SVM with linear kernel Features: • surface features: ngrams, context-target bigrams • sentiment lexicon features: counts, sum, max • syntactic features: ngrams and context-target bigrams on parse trees, parse label features Results:
Features • Surface features: • unigrams (single words) and bigrams (2-word sequences) extracted from a term and its surface context • context-target bigrams (i.e., bigrams formed by a word from the surface context and a word from the term itself) • Lexicon features: • the number of positive/negative tokens • the sum of the tokens’ sentiment scores • The maximal sentiments core • Parse features: • word- and POS-ngram • context-target bigrams, i.e., bigrams composed of a word from the parse context and a word from the term • all paths that start or end with the root of the target terms
Subtask 3: Aspect Category Detection Task: to detectaspectcategoriesdiscussed in a sentence Approach: • SVM with linear kernel • fivebinaryclassifiers (one-vs-all) • assigncmax = argmaxcP(c|d) ifP(cmax|d > 0.4) Features: • word and characterngrams • stemmedngrams (Porter stemmer) • word cluster ngrams (Brown clusteringalgorithm) • YelpRestaurant Word-AspectAssociationlexicon Results:
Subtask 4: Aspect Category Polarity Task: to detect sentiment towards a given aspect category Approach: • one 4-class SVM classifier with 2 copies of each feature: generic and category-specific • add features for terms associated with aspect category Features: • word and character ngrams, POS tags • word cluster ngrams • sentiment lexicon features Results:
NRC Canada Results • Top results on subtasks 2, 3, and 4 • Statistical approaches with surface-form and lexicon features • Most useful features: derived from automatically generated in-domain lexical resources • Resources to download: www.purl.com/net/sentimentoftweets
SemEval 2017 Task 4 http://alt.qcri.org/semeval2017/task4/data/uploads/semeval2017-task4.pdf
Tasks • Subtask A: Given a tweet, decide whether it expresses POSITIVE, NEGATIVE or NEUTRAL sentiment. • Subtask B: Given a tweet and a topic, classify the sentiment conveyed towards that topic on a two-point scale: POSITIVE vs. NEGATIVE. • Subtask C: Given a tweet and a topic, classify the sentiment conveyed in the tweet towards that topic on a five-point scale: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE. • Subtask D: Given a set of tweets about a topic, estimate the distribution of tweets across the POSITIVE and NEGATIVE classes. • Subtask E: Given a set of tweets about a topic, estimate the distribution of tweets across the five classes: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE
Best Submission The top results for subtask A were achieved by: C. Baziotis, N. Pelekis, C. Doulkeridis. 2017. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation. Vancouver, Canada, SemEval ’17, pages 747–754. https://aclweb.org/anthology/S17-2126
Text Processing • Unlabeled Dataset. Collected a big dataset of 330M English Twitter messages, which is used for: • calculating words statistics needed by our text processor • training word embeddings • Tokenization. Recognizes the Twitter markup and some basic sentiment expressions,emoticons, expressions such as dates (e.g. 07/11/2011, April 23rd), times (e.g. 4:30pm, 11:00 am), currencies (e.g. $10, 25mil, 50e), acronyms, censored words (e.g. s**t), words with emphasis (e.g. *very*) and more. • Text Postprocessing. Performs spell correction, word normalization and segmentation and decides which tokens to omit, normalize or annotate (surround or replace with special tags)
Architecture for Task A The MSA model: A 2-layer bidirectional LSTM with attention over that last layer
Task C The TSA model: A Siamese Bidirectional LSTM with context-aware attention
Conclusions • Paradigm shift in 2015-2016 • From traditional SoTA classifiers with features (SVM, Makov Models) • To Deep Recurrent or Convolutional Neural Networks, using word embeddings as features