380 likes | 566 Views
Opinion Analysis. Sudeshna Sarkar IIT Kharagpur. Introduction – facts and opinions. Two main types of information on the Web. Facts and Opinions Current search engines search for facts (assume they are true) Facts can be expressed with topic keywords.
E N D
Opinion Analysis Sudeshna Sarkar IIT Kharagpur
Introduction – facts and opinions • Two main types of information on the Web. • Facts and Opinions • Current search engines search for facts (assume they are true) • Facts can be expressed with topic keywords. • Search engines do not search for opinions • Opinions are hard to express with a few keywords • How do people think of Motorola Cell phones? • Current search ranking strategy is not appropriate for opinion retrieval/search.
Overview • Motivation • Definitions • Coarse grained vs Fine grained opinion analysis • Opinion Lexicons • Approaches to document level opinion analysis • Lexicon based • Supervised learning approaches • Mixed approaches • Approaches to fine-grained opinion analysis • Rule based • Learning • Opinion mining work at IIT Kharagpur
Opinion Mining • Search for and aggregate opinions from online sources • Many reviews have both positive and negative sentences • Many products are liked by some and disliked by others – there must be different reasons • Identify different features/ aspects of the target and the opinion on these separately
Why do opinion analysis? • Opinion search • to extract examples of particular types of positive or negative statements on some topic. • Opinion question answering • What is the reaction to the Left Front’s stand on the nuclear deal? • Is support diminishing for the UPA government? • Product review mining • What features of “Mr Coffee programmable coffee maker” do users like and what they dislike (Microsoft Live) • Review classification • Tracking sentiment toward topics over time • to track the ups and downs of aggregate attitudes to a brand or product
Introduction – Applications • Businesses and organizations: product and service benchmarking. Market intelligence. • Business spends a huge amount of money to find consumer sentiments and opinions. • Consultants, surveys and focused groups, etc • Individuals: interested in other’s opinions when • Purchasing a product or using a service, • Finding opinions on political topics, • Many other decision making tasks. • Ads placements: Placing ads in user-generated content • Place an ad when one praises an product. • Place an ad from a competitor if one criticizes an product. • Opinion retrieval/search: providing general search for opinions.
Question Answering • Opinion question answering: • Q: What is the international reaction to the reelection of Robert Mugabe as President of Zimbabwe? • A: African observers generally approved of his victory while Western Governments denounced it.
Opinion search(Liu, Web Data Mining book, 2007) • Can you search for opinions as conveniently as general Web search? • Whenever you need to make a decision, you may want some opinions from others, • Wouldn’t it be nice? you can find them on a search system instantly, by issuing queries such as • Opinions: “Motorola cell phones” • Comparisons: “Motorola vs. Nokia” • Cannot be done yet!
Typical opinion search queries • Find the opinion of a person or organization (opinion holder) on a particular object or a feature of an object. • E.g., what is Bill Clinton’s opinion on abortion? • Find positive and/or negative opinions on a particular object (or some features of the object), e.g., • customer opinions on a digital camera, • public opinions on a political topic. • Find how opinions on an object change with time. • How object A compares with Object B? • Gmail vs. Yahoo mail
Find the opinion of a person on X • In some cases, the general search engine can handle it, i.e., using suitable keywords. • Bill Clinton’s opinion on abortion • Reason: • One person or organization usually has only one opinion on a particular topic. • The opinion is likely contained in a single document. • Thus, a good keyword query may be sufficient.
Find opinions on an object X We use product reviews as an example: • Searching for opinions in product reviews is different from general Web search. • E.g., search for opinions on “Motorola RAZR V3” • General Web search for a fact: rank pages according to some authority and relevance scores. • The user views the first page (if the search is perfect). • One fact = Multiple facts • Opinion search: rank is desirable, however • reading only the review ranked at the top is dangerous because it is only the opinion of one person. • One opinion Multiple opinions
Search opinions (contd) • Ranking: • produce two rankings • Positive opinions and negative opinions • Some kind of summary of both, e.g., # of each • Or, one ranking but • The top (say 30) reviews should reflect the natural distribution of all reviews (assume that there is no spam), i.e., with the right balance of positive and negative reviews. • Questions: • Should the user reads all the top reviews? OR • Should the system prepare a summary of the reviews?
User generated content • Word of mouth on the web. • Review sites • Blogs • Online forums • Shopping comparison sites • User reviews • Mine opinions expressed in the user-generated content • Challenging task • Useful to individual consumers and companies.
Motivation for Consumer • I want to buy a camera. • Which model should I pick? • Ask my friends • Use the internet • CEA-CNET Study: Tech-Savvy Consumers Use Internet to Research Products Before Buying Them • Wireless News, November, 2007 • Seventy Percent of Consumers Use Internet to Research Consumer Packaged Goods, According to Prospectiv Survey • Market Wire, January, 2008
Businesses • Identify opinions about products – help to position/ adapt products • Much of product feedback is web-based • provided by customers/critiques online through websites, discussion boards, mailing lists, and blogs, CRM Portals. • Market research is becoming unwieldy • Sources are heterogeneous and multilingual in nature
Facts vs Opinions • An opinion is a person's ideas and thoughts towards something. It is an assessment, judgment or evaluation of something. An opinion is not a fact, because opinions are either not falsifiable, or the opinion has not been proven or verified. ...en.wikipedia.org/wiki/Opinion • Subjectivity: The linguistic expression of somebody’s emotions, sentiments, evaluations, opinions, beliefs, speculations, etc. • Polarity: positive and negative • This camera is awesome. • The movie is too long and boring. • Strength of opinion
Levels of opinion analysis Coarse to fine grained opinion analysis • Document level: At the document (or review) level • Subjective vs Objective • Sentiment classification: positive, negative or neutral • Sentence level, Expression level Task 1: identifying subjective/opinionated sentences (or clauses/ phrases) • Classes: objective and subjective (opinionated) Task 2: sentiment classification of sentences • Classes: positive, negative and neutral. • But a document/ sentence may contain multiple opinions on more than one topic from one or more opinion holder
Lexicon Development • Manual • Semi-automatic • Fully automatic • Find relevant words, phrases, patterns that can be used to express subjectivity • Determine the polarity of subjective expressions
Opinion Words An opinion lexicon containing lists of positive and negative phrases is very useful for the opinion mining task at different levels • Positive: beautiful, wonderful, good, amazing, • Negative: bad, poor, terrible, cost someone an arm and a leg How to compile such a list? • Dictionary-based approaches • Corpus-based approaches • Supervised • Semi-supervised • BUT • Some opinion words are context independent (e.g., good). • Some are context dependent (e.g., long).
Hand created lists • Create lists of opinion words appropriate for the domain manually • Sentiment term • Polarity • Strength These approaches, while being interesting, are labor intensive and can be vulnerable to error and high maintenance costs
Dictionary-based approaches Start from a set of seed opinion words Use WordNet’s synsets and hierarchies to acquire opinion words Use the seeds to search for synonyms and antonyms in WordNet (eg, Hu and Liu, 2004). 21
Dictionary-based approaches Use additional information (e.g., glosses) and learning from WordNet (Andreevskaia and Bergler, 2006) (Esuti and Sebastiani, 2005). 22
Dictionary-based approaches Advantage: Good to find a lot of such words Weakness: Do not find context dependent opinion words, e.g., small, long, fast. 23
Corpus-based approaches 24 • Rely on syntactic rules and co-occurrence patterns to extract from large corpora • Use a list of seed words • A large domain corpus • Machine learning • Advantages: This approach can find domain (corpus) dependent opinions.
How to identify subjective terms? • Assume that contexts are coherent • Statistical Association: If words of the same orientation like to co-occur together, then the presence of one makes the other more probable • Use statistical measures of association to capture this interdependence • Assume that contexts are coherent • Assume that alternatives are similarly subjective
Corpus-based approaches (contd) Conjunctions: Conjoined adjectives usually have the same orientation (Hazivassiloglou and McKeown 1997). E.g., “This car is beautifulandspacious.”(conjunction) Start with seed words Use conjunctions to find adjectives with similar orientations Use log-linear regression to aggregate information from various conjunctions Use hierarchical clustering on a graphrepresentation of adjective similarities to find two groups of same orientation 26
slow scenic scenic nice nice terrible terrible painful handsome handsome painful fun fun expensive expensive comfortable comfortable
Growing contextual opinion words 28 [Ding, Liu, Wu] • Intra-sentence conjunction rule Opinion on both sides of “and” / two consecutive sentences tend to be the same • E.g., “This camera takes greatpictures and has a longbattery life”. But with a “but”-like clause, the opinions tend to be of opposite polarity. • Context is important • Long battery life vs Long time to focus • Growing • by applying various conjunctive rules • Verifying the results as the system sees more reviews by those conjunctive rules • Only keep those opinions which the system is confident about, controlled by a confidence limit.
Semantic Orientation by Association • Labeled semantic orientation of words • Pwords = {good, nice, excellent, positive, fortunate, correct, superior} • Nwords = {bad, nasty, poor, negative, unfortunate, wrong, inferior}. • Various approach to calculate the semantic association of two words • Pointwise Mutual Information (PMI) [Church and Hanks 1989] • Latent Semantic Indexing (LSI) Dumais et al. 1990] • Likelihood Ratios [Dunning 1993]
Turney 2002; Turney & Littman 2003 • Determine the semantic orientation of each extracted phrase based on their association with seven positive and seven negative seed words
Weakly spervised learning Gammon Aue 2005 • Given a list of seed words (seed words 1) • Get more seed words (seed words 2)– words with low PMI at sentence level • Get semantic orientation of (seed words 2) by PMI at document level • Get Semantic orientation of all words by PMI with all seed words
Document level opinion analysis • Polarity classification: Classify documents (e.g., reviews) based on the overall sentiments expressed by authors, • Approaches • Use opinion lexicon • Knowledge Engineering • Supervised learning techniques • Classifying using the Web as a corpus • Semi-supervised
Knowledge Engineering • Make use of lists of sentiment terms • Manually create analysis components based on cognitive linguistic theory: parser, feature structure representation, etc
Supervised polarity classifier Requirements: A labeled database of opinion • Download ratings from Amazon.com, epinions.com etc. • Build a binary opinion classifier • From positive and negative ratings • Merge 1 and 2 stars to negative and 3, 4 and 5 to positive • Use thresholded SVM, maximum entropy, naïve Bayes, etc.
Supervised Training • Obtain Labeled Sentences: positive, neutral, negative • Extract features: words, n-grams, multi word expressions, feature generalization [Kim & Hovy 2007] • Feature values: binary/ frequency • Run Training algorithm on the features to give a classifier • [Optional] Do feature selection (use log-likelihood ratio)
Semi-supervised approaches • Fully supervised techniques require • large amount of labeled data for the given domain • Semi-supervised systems • Use small amount of domain knowledge • From a small set of seed words use domain corpus to get domain relevant opinion words as discussed earlier
Semi-supervised approach • Gamon & Aue 2005 • Obtain opinion words by semi-supervised approach • Given a domain corpus, label data using average semantic orientation • Train classifier on labeled data