Automatic Sentiment Analysis in On-line Text

Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven

Introduction • Goal: determine the sentiment of a person towards a topic • Practical use • Customer feedback • Marketing research • Monitoring newsgroups and forums (flame detection)‏ • Augmentation of search engines (e.g. Opinmind.com)‏ • Opportunity • Blogs • Forums • Review sites Noisy texts

Overview • Introduction • Emotions • Machine learning (ML) techniques • Challenges • Experiments, results & discussion • Conclusion & future work

Concepts of emotions • “Sentiments are either emotions, or they are judgements or ideas prompted or coloured by emotions” • An emotion • Is usually caused by a person consciously or unconsciously evaluating an event, which is denoted appraisal in psychology • Gives priority for one or a few kind of actions to which it gives a sense of urgency

Emotions in written text • Appraisal: evaluation • e.g. It was an amazing show. • Direct expressions • e.g.I am delighted of the final results. • Elements of actions • e.g. I was grinning the whole way through it and laughing out loud more than once.

ML: Document representation (1)‏ • Feature extraction • Features are used to represent a document as a vector • Values in the vector indicate frequency or presence of the feature at the corresponding index in a dictionary • The dictionary consists of all features encountered in the training documents

ML: Document representation (2)‏ • Unigrams: all words • N-grams: all sets of N successive words • N = 1: unigrams, N = 2: bigrams, N = 3: trigrams • e.g. I love, not worth, returned it • Lemmas: basic dictionary form of all words • e.g. cars -> car, was -> be, better -> good • Opinion words: use only words from a pre-defined list as features • Adjectives: use only adjectives (about 7.5% of the text)‏

ML: Document representation (3)‏ • Stopword removal • from list with determiners, prepositions, possessive pronouns, ... • Negation tagging • of each word following a negation until the first punctuation • e.g. I don't like this movie. -> I don't NOT_like NOT_this NOT_movie.

ML: Techniques • Classifiers successful for text classification • Support Vector Machines (SVM)‏ • Naive Bayes Multinomial (NBM)‏ • Maximum Entropy (Maxent)‏

Challenges (1)‏ • Topic-sentiment relation • e.g. Competing with the vastly superiorCasino Royale for the same action-movie audience, Deja Vu will likely be brushed aside and quickly forgotten. • e.g. A Good Year is a well-acted well-written well-directed movie but it just wasnt my cup of tea. • Topic-neutral text • e.g. In the movie Bond can start to untangle a terror network if he wins this big poker game at Casino Royale in Montenegro.

Challenges (2)‏ • Cross-domain classification • Training (and testing) was done on a mixture of movie and car reviews • Text quality • e.g. Nothing but a French kiss-off Search Recent Archives Web for (rm) else • • • • • • • • • • • • • • • • ONLINE EXTRAS SITE SERVICES Movie Listings Friday Nov 10 2006 Posted on Fri Nov. 10 2006 MOVIE REVIEW A Good Year a flat bouquet Nothing but a French kiss-off Gladiator collaborators seem defeated by light-weight love story.By ROBERT W.

Corpora • Pang and Lee's movie review corpus • 1000 positive and 1000 negative reviews • Reviews mix objective and subjective information • Often used in the literature • Our blog corpus • 759 positive, 205 negative and 3527 neutral sentences • Gathered from blogs, discussion boards and other websites • Extended with reviews from Customer Review Datasets corpus by Hu and Liu for balancing positive and negative

Evaluation measures • Accuracy • Precision: • Recall: • Other • Speed • Available resources

Results (1)‏ • Pang and Lee's movie review corpus • N-grams + easy to extract + require no special tools − large feature vector size • NBM+fast

Results (2)‏ • Our blog corpus • The baseline approach: uses basic ML techniques as described earlier • Our latest approach: achieves considerable improvements over the baseline

Conclusion & future work • Detection topic-sentiment relation far from perfect • Dirty texts are making the task even more difficult • Lack of training examples

Automatic Sentiment Analysis in On-line Text

Automatic Sentiment Analysis in On-line Text

Presentation Transcript

Automatic Domain Adaptive Sentiment Analysis Phase 1

Sentiment Analysis

Sentiment analysis on social media

Sentiment Analysis

Manual and Automatic Subjectivity and Sentiment Analysis

Sentiment Analysis

Sentiment Analysis

Sentiment Analysis

Sentiment Analysis

Sentiment Analysis

Sentiment analysis

Sentiment Analysis

Off-line (and On-line) Text Analysis for Computational Lexicography

Sentiment Analysis on Twitter Data

Off-line (and On-line) Text Analysis for Computational Lexicography

Manual and Automatic Subjectivity and Sentiment Analysis

Sentiment Analysis

Sentiment Analysis

Sentiment Analysis

Sentiment Analysis

The Automatic Text Sentiment Analysis Method based on Emotional Vocabulary

Sentiment Analysis