460 likes | 751 Views
Sentiment Detection and its Applications. Michael Gamon Microsoft Research - NLP. Overview. Why bother? Polarity: Basic techniques Challenges Valence shifters, sentiment in context Domain-dependence, time dependence More aspects of sentiment Strength Target Holder Subjectivity
E N D
Sentiment Detection and its Applications Michael Gamon Microsoft Research - NLP
Overview • Why bother? • Polarity: Basic techniques • Challenges • Valence shifters, sentiment in context • Domain-dependence, time dependence • More aspects of sentiment • Strength • Target • Holder • Subjectivity • Sentence and document level • The role of linguistic processing • Applications • Open questions/ideas • Some Resources
Why bother? • Ubiquitous user generated content: • 44% of Internet users are content creators (PEW) • More than 70% said they sometimes or frequently rely on online product or book reviews • 62% rely on the popularity of information based on users’ votes or ratings • 78% of have recently voted or rated something online • 28% have recently written a product or book review • 12% of Internet users have posted comments on blogs • Opinion matters: • Marketing • Politics • Shopping
Some basic definitions • Polarity: positive or negative? • Strength: ranting/raving or lukewarm? • Target: what is the sentiment about? • Holder: who holds the sentiment?
Polarity: Basic Techniques • Knowledge Engineering • Creating sentiment resources from seed terms • Classifying by using the web as a corpus • Supervised machine learning techniques • Unsupervised techniques ????
The right granularity • Document level or sentence level? • Movie reviews: typically classified at the document level • For product reviews, the sentence level is more appropriate:Overall, this is a great camera. The resolution and lens leave nothing to be desired. The lag time, however, should be improved.
Knowledge Engineering • Hearst 92, Sack 94: • Manual creation of analysis components, based on a cognitive linguistic theory: parser, feature structure representation etc • Manually creating lists of sentiment terms/sentiment dictionaries
Supervised machine learning techniques • Pang, Lee, Vaithyanathan (2002): • Supervised classification on 1400 movie reviews • Features: • Unigrams (with negation tagging) • Bigrams • Continuously valued (frequency) and binary (presence) features • Classifiers: • Maximum entropy • (linear) Support Vector Machine • Naïve Bayes
Pang et al. Conclusions • Naïve Bayes works best (but: no parameter tuning was performed…) • Binary features (presence) work better than continuously valued ones (frequency) • Unigrams alone work best • Tiny (negligible) gain by negation tagging • Result on movie reviews: 82.9% accuracy
Our results (2k movie data set, Pang & Lee 2004) • Accuracy: 90.45% (Pang & Lee 2004: 87.2%) • Classifier used: linear SVM • Best performing feature set: unigrams + bigrams + trigrams • Feature reduction to top 20k features (using log likelihood ratio) on training set
Polarity so far: • Simple bag of word features (unigram, possibly bigrams/trigrams) • Manual data annotation • Some state-of-the-art classifier • Simple feature reduction
Creating sentiment resources from seed terms (weakly supervised) • Hatzivassiloglou, Mc Keown (1997): • Starting with seed words • Use conjunctions to find adjectives with similar orientations • wonderful and well-received, *wonderful but well-received • *terrible and well-received, terrible but well-received • Using log linear regression to aggregate information from various conjunctions • Using hierarchical clustering on a graph representation of adjective similarities to find two groups of same orientation • Result: up to 92% accuracy in classifying adjectives
Classifying by using the web as a corpus (weakly supervised) • Turney (2002), Turney and Littman (2002) • using the web as a corpus • Semantic orientation (SO) of a phrase p: SO = PMI(p, “excellent”) – PMI(p, “poor”) • PMI is estimated from web search results: • Resultcount(p NEAR “excellent”), Resultcount(p NEAR “poor”) etc • A review counts as “recommend” when avg. SO in the review is positive, as “not recommend” otherwise • Focus on phrases with adverbs/adjectives • Accuracy from 84% (car reviews) to 66% (movie reviews)
Another look at weakly supervised learning (1) (Gamon and Aue 2005) • The basic idea of using seed words: • Given a list of seed words, get semantic orientation of other words • How about: Given a list of seed words, get more seed words, then get semantic orientation
Another look at weakly supervised learning (2) • Observation in car review data: • At the sentence level: sentiment terms (especially of opposite orientation) do not co-occur • At the document (review) level: sentiment terms do co-occur • Using these generalizations can help to rapidly bring up a sentiment system for a new domain • The method outperforms semantic orientation alone
Pulse Seed words 2 Semantic orientation of seed words 2: PMI at review level Semantic orientation of all words: PMI with all seed words Findwords with low PMI (sentence level) Seed words 1 Label data using average semantic orientation Domain corpus Train classifier on labeled data
Overview • Why bother? • Polarity: Basic techniques • Challenges • Valence shifters, sentiment in context • Domain-dependence, time dependence • More aspects of sentiment • Strength • Target • Holder • Subjectivity • Sentence and document level • The role of linguistic processing • Applications • Open questions/ideas • Some Resources
Challenge 1: Sentiment in context • Valence shifters: • This is great – this is not great - this could be great - if this were great – this is just great • Target dependence: • The camera easily fits into a shirt pocket - the lens cap comes off easily • The rooms are small – there is a small and cozy dining room • Complex text patterns • ProblemehatteichmitdieserneuenTechniküberhauptkeine • ‘Problems had I with this new technique none’
Valence shifters (1) • Kennedy and Inkpen (2006): small, but statistically significant gains through dealing with negation and intensifiers (very/hardly) • Pang et al (2002): annotating negation does not help (not statistically significant) • Logan Dillard (2007): • 15% of all sentences in a sample of product reviews contain valence shifters • All errors are not the same • On the errors that matter most, dealing with valence shifters helps
Valence shifters (2) • 3 way classification: positive/neutral/negative • The worst error: Improper Class (IC) error, positive as negative, vice versa • Data: • 22k sentences from product reviews, manually annotated for sentiment • 2700 sentences annotated for valence shifters
Valence shifters (3) Distribution of different types of valence shifters
Valence shifters (4) • Best results: • Small set of preprocessing heuristics (11 total): • Failed to work did not work • Would not hesitate to X -> would X • Custom window size: n words to the right of a negation word count as negated. N varies with the negation word • Naïve Bayes classifier
Valence shifters (5) • Reduction of IC errors: • 29% on all data • 61% on sentences with valence shifters
Valence shifters: conclusion • The problem is real • Not only adjectives matter • Reduction of most “painful” errors can be significant
Challenge 2:Domain dependence • Aue and Gamon (2005): 4 domains • Movie reviews (movie) • Book reviews (book) • Product support services web survey data (pss) • Knowledge base web survey data (kb)
Domain dependence (3) • Solutions: • Train one classifier on all data from non-target domains • Train as in (1), but limit features to those observed in target domain • Classifier ensemble: train three classifiers on non-target domains, train classifier combination on small amount of in-domain data • Using in-domain unlabeled data (Nigam et al. 2000) and a small amount of labeled data
Domain dependence (5): structural correspondence learning (Blitzer et al 2007) • Domain1: labeled data, unlabeled data • Domain2: unlabeled data • Find terms that are good predictors for sentiment in Domain1 and are frequent in Domain2 (→ domain-neutral “pivots”) (e.g. excellent, awful) • Find terms in Domain2 that correlate highly with pivots (→ sentiment terms specific to Domain2) • Use correlation to weigh these features • Results: significant error reduction
Domain dependence: tentative conclusions • Using a small amount of labeled in-domain data beats using none at all • Results are … well … domain-dependent
Overview • Why bother? • Polarity: Basic techniques • Challenges • Valence shifters, sentiment in context • Domain-dependence, time dependence • More aspects of sentiment • Strength • Target • Holder • Subjectivity • Sentence and document level • The role of linguistic processing • Applications • Open questions/ideas • Some Resources
Strength of sentiment • Wilson et al (2004): strength of opinion • Training: annotations in MPQA corpus (neutral, low, medium, high) • Various classifiers, significant improvement over baseline • Pang and Lee (2005): inferring star-rating on movie reviews
Targets of sentiment • Product reviews: what is the opinion about? • Digital camera: lens, resolution, weight, battery life… • Hotel: room comfort, noise level, service, cleanliness… • Popescu and Etzioni (2005) OPINE: • Using “meronymy patterns”: X contains Y, X comes with Y, X has Y etc • High PMI between a noun phrase and a pattern indicates candidates for features • Liu et al. (2005) Opinion Observer: • Using supervised rule learning to discover features of products
Targets and holders of sentiment • Kim and Hovy (2005) - holders: • Using parses • Train (supervised) Maximum Entropy ranker to identify sentiment holders based on parse features • Training data: MPQA corpus subset • Kim and Hovy (2006) - holders and targets: • Collect and label opinion words manually • Find opinion-related frames (FrameNet) • Using semantic role labeling to identify fillers for the frames, based on manual mapping tables
Subjectivity • Subjective versus objective language: hard to define, but useful in practice (Wiebe et al. 2005): • Word sense disambiguation (Wiebe and Mihalcea 2006) • Information Retrieval/Opinion Retrieval • Question answering/Multi-perspective question answering • Polarity detection (Pang & Lee 2004 - really?)
Sentence AND document level: Structured models (McDonald et al. 2007) • Joint classification at the sentence and document level: classification decisions at one level can influence decisions at another • Using structured perceptron (Collins 2002) • Improves sentence level classification accuracy significantly, document level accuracy not significantly
The role of linguistic analysis (1) • Polarity: • Consensus: linguistic analysis not very useful • Dissenting opinions: • Baayen et al (1996) • Gamon (2004) syntactic analysis features help in noisy customer feedback domain
The role of linguistic analysis (2) • Holder, target identification: • Patterns, semantic role labeling, semantic resources for synonymy antonymy (FrameNet, WordNet) • Strength: • Syntactic analysis
Short summary • Quick solution works well for in-domain polarity: • bag-of-words • supervised learning • feature selection, classifier choice • There’s no silver bullet for cross-domain adaptation • Finer-grained polarity assessment (sentence/phrase level) has advantages • Paying attention to valence shifters pays off • To assess holders, strength and targets: more involved analysis is necessary • … it’s not just about the adjectives …
Sample Applications • Hotel review mining: Opine (Popescu) • Business Intelligence • Political (http://www.ecoresearch.net/election2008/) • Pulse • OpinionIndex
Open issues/ideas • Sentiment and what we can detect from text are not the same • Languages and cultures differ in the expression of sentiment (not everyone is an “outspoken” American consumer…) – no cross-lingual studies yet! • Discourse and sentiment • Part of speech and sentiment (no cross-domain study yet) • Seed word approaches: • how much of a role does the selection of seed words play? • Can we automatically find the best seed words? • Social media opens new sources of information: • What is the history of an author’s sentiment about X? • What is the social network of an author (mac-haters versus mac-lovers)?
Some Resources • Sentiment bibliography:http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html • Corpora: • MPQA: http://www.cs.pitt.edu/mpqa/ • Movie reviews (polarity, scale, subjectivity): http://www.cs.cornell.edu/people/pabo/movie-review-data/ • Various sentiment dictionaries: • http://www.wjh.harvard.edu/~inquirer/ • www.cs.pitt.edu/mpqa • http://www.unipv.it/wnop/ • SentimentAI group on Yahoo • Bing Liu’s tutorial on Opinion Mining: http://www.cs.uic.edu/~liub/teach/cs583-spring-07/opinion-mining.pdf