Opinion Extraction: Finding Strong and Weak Opinion Clauses

Just how mad are you? Finding strong and weak opinion clauses Theresa Wilson, Janyce Wiebe, Rebecca Hwa University of Pittsburgh AAAI-2004

Problem and Motivation Problem: Opinion Extraction • Automatically identify and extract attitudes, opinions, sentiments in text Applications • Information Extraction, Summarization, Question Answering, Flame Detection, etc. Focus: • Individual clauses and strength AAAI 2004

Motivating Example “I think people are happy because Chavez has fallen. But there’s also a feeling of uncertainty about how the country’s obvious problems are going to be solved,” said Ms. Ledesma. AAAI 2004

Motivating Example medium strength Though some of them did not conceal their criticisms of Hugo Chavez, the member countries of the Organization of American States condemned the coup and recognized the legitimacy of the elected president. high strength low strength AAAI 2004

Our Goal • Identify opinions below sentence level • Characterize strength of opinions AAAI 2004

Our Approach Identify embedded sentential clauses • Dependency tree representation Supervised learning to classify strength of clauses NO OPINIONVERY STRONG neutral low medium high Significant improvements over baseline Mean-squared error < 1.0 AAAI 2004

Our Approach I am furious that my landlord refused to return my security deposit until I sued them. am High Strength I furious that refused Opinionated Sentence landlord return until (Riloff et al. (2003), Riloff and Wiebe (2003)) my to deposit sued Medium Strength my security I them Neutral AAAI 2004

Outline • Introduction • Opinions and Emotions in Text • Clues and Features • Subjectivity Clues • Organizing Clues into Features • Experiments • Strength Classification • Results • Conclusions AAAI 2004

Private States and Subjective Expressions Private state: covering term for opinions, emotions, sentiments, attitudes, speculations, etc. (Quirk et al., 1985) Subjective Expressions: words and phrases that express private states (Banfield, 1982) “The US fears a spill-over,” said Xirao-Nima. “The report is full of absurdities,” he complained. AAAI 2004

Corpus of Opinion Annotations • Multi-perspective Question Answering (MPQA) Corpus • Sponsored by NRRC ARDA • Released November, 2003 • http://nrrc.mitre.org/NRRC/publications.htm • Detailed expression-level annotations of private states: strength • See Wilson and Wiebe (SIGdial 2003) Freely Available AAAI 2004

Clues from Previous Work • 29 sets of clues • Culled from manually developed resources • Learned from annotated/unannotated data • Words, phrases, extraction patterns Examples SINGLE WORDS – bizarre, hate, concern, applaud, foolish, vexing PHRASES – long for, stir up, grin and bear it, on the other hand EXTRACTION PATTERNS– expressed (condolences|hope|*) show of (support|goodwill|*) AAAI 2004

I think people are happy because Chavez has fallen Training data S NP VP PRP VBP SBAR I think NP VP NNS VBP JJ people are happy Syntax Clues: Generation Parse think,VBP head subj obj I,PRP are,VBP pred subj Convert to dependency people,NNS happy,JJ modifiers AAAI 2004

think,VBP I,PRP are,VBP people,NNS happy,JJ because,IN fallen,VBN Chavez,NNP has,VBZ Syntax Clues: Generation 1. root 2. leaf 3. node 4. all-kids 5. bilex 5 Classes of Clues Dependency Parse Tree subj obj subj obj pred i Example: allkids(fallen,VBN,subj,Chavez,NNP,mod,has,VBZ) Example: bilex(are,VBP,pred,happy,JJ) subj mod AAAI 2004

Syntax Clues: Selection ≥ 70% instances in subjective expressions in training data? NO YES Discard Frequency ≥ 5 NO YES Any instances in AUTOGEN Corpus? Highly Reliable NO YES Not Very Reliable ≥80% instances in subjective sentences? NO YES Parameters chosen on tuning set Somewhat Reliable Discard AAAI 2004

Syntax Clues • 15 sets of clues • 5 classes: • root, leaf, node, bilex, allkids • 3 reliability levels: • highly reliable, somewhat reliable, not very reliable AAAI 2004

believe happy sad think although because … S1 0 1 0 1 0 1 SET1 SET2 … SET44 S1 2 1 … 0 Organizing Clues into Features SET1 = {believe, happy, sad, think, … } SET2 = {although, because, however, …} … SET44 = {certainly, unlikely, maybe, …} S1: I think people are happybecause Chavez has fallen Inputto Machine Learning Algorithm: AAAI 2004

NEUTRAL_SET LOW_SET MEDIUM_SET HIGH_SET S1 0 1 2 0 Organizing Clues by Strength Training Data NEUTRAL_SET = {however, … } LOW_SET = {because, maybe, think, unlikely, …} MEDIUM_SET = {believe, certainly, happy, sad, … } HIGH_SET = {condemn, hate, tremendous, …} S1: I think people are happybecause Chavez has fallen Inputto Machine Learning Algorithm: AAAI 2004

Clues and Features: Summary Many Types/Sets of Subjectivity Clues • 29 from previous work • 15 new syntax clues TYPE – features correspond to type sets • 44 features STRENGTH– features correspond to strength sets • 4 features (neutral, low, medium, high) AAAI 2004

Approaches to Strength Classification Target Classes: neutral, low, medium, high Classification Regression • Boosting • BoosTexter (Schapire and Singer, 2000) • AdaBoost.HM • 1000 rounds of boosting Support Vector Regression • SVMlight (Joachims, 1999) • Discretize output into ordinal strength classes AAAI 2004

Approaches to Strength Classification: Evaluation Target Classes: neutral, low, medium, high Classification Regression Accuracy Mean-Squared Error total correct 1 N N AAAI 2004

Train Test Train Test Train Test Units of Classification Level 1 Train think,VBP I,PRP are,VBP Level 2 Test people,NNS happy,JJ because,IN fallen,VBN Level 3 Chavez,NNP has,VBZ AAAI 2004

Gold-standard Classes Level 1: medium low think,VBP I,PRP are,VBP Level 2 medium people,NNS happy,JJ because,IN fallen,VBN Level 3 neutral medium Chavez,NNP has,VBZ AAAI 2004

Overview of Results • 10-fold cross validation over 9313 sentences • Bag-of-words (BAG) • Best Results: All Clues + Bag-of-words Boosting • MSE: 48% to 60% improvement over baseline • Accuracy: 23% to 79% improvement over baseline Support Vector Regression • MSE: 57% to 64% improvement over baseline • Baseline – most frequent class AAAI 2004

Results: Mean-Squared Error SVM Boosting 1.6 STRENGTH + BAG STRENGTH Features TYPE Features BAG (Bag-of-words) Improvements over BASELINE SVM: 57% - 64% Boosting: 48% to 60% Clause Level BASELINE: 1.9 to 2.5 AAAI 2004

Results: Accuracy SVM Boosting STRENGTH + BAG STRENGTH Features TYPE Features BAG (Bag-of-words) Improvements over BASELINE SVM: 57% clause level 1 Boosting: 23% to 79% Clause Level BASELINE: 30.8 to 48.3 AAAI 2004

Removing Syntax Clues: MSE SVM Boosting MSE Clause Level All Clues MINUS Syntax Clues AAAI 2004

Removing Syntax Clues: Accuracy SVM Boosting % Accuracy Clause Level All Clues MINUS Syntax Clues AAAI 2004

Related Work Types of Attitude Gordon et al. (2003), Liu et al. (2003) Tracking sentiment timelines Tong (2001) Positive/Negative Language Pang et al. (2002), Morinaga et al. (2002), Turney and Littman (2003), Yu and Hatzivassiloglou (2003), Dave et al. (2003), Nasukawa and Yi (2003), Hu and Liu (2004) Public sentimentin messageboardsand stock prices Das and Chen (2001) AAAI 2004

Conclusions • Promising results • MSE under 0.80 for sentences • MSE near 1 for embedded clauses • Embedded clauses more difficult • less information • Wide range of features produces best results • syntax clues • Organizing features by strength is useful AAAI 2004

Thank you! • MPQA Corpus http://nrrc.mitre.org/NRRC/publications.htm AAAI 2004

Opinion Extraction: Finding Strong and Weak Opinion Clauses

Opinion Extraction: Finding Strong and Weak Opinion Clauses

Presentation Transcript

Weak Syllables and Strong Syllables

Strong Acid-Weak Base and Weak Acid - Strong Base

Just how mad are you? Finding strong and weak opinion clauses

Weak, Strong and Nonelectrolytes

10.2. Strong and Weak Acids

Finding strong and weak opinion clauses

Strong AI and Weak AI

Strong and Weak Acids

Strong and Weak Acids and Bases

Strong or Weak

Strong or weak

Strong or Weak

Weak and Strong Acids

10.3 Strong and Weak Bases

Strong and Weak Acids and Bases

Strong and Weak Electrolytes

Weak or strong?

Just how mad are you? Finding strong and weak opinion clauses

Previously in Chem104: more acid/base reactions: weak / weak strong / strong strong / weak

Strong and Weak Acids and Bases

Strong Or Weak?

Strong vs. Weak Responses