1 / 56

Identifying Collocations for Recognizing Opinions

Identifying Collocations for Recognizing Opinions. Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh. Office of Naval Research grant N00014-95-1-0776. Introduction. Subjectivity: aspects of language used to express opinions and evaluations (Banfield 1982).

lucio
Download Presentation

Identifying Collocations for Recognizing Opinions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office of Naval Research grant N00014-95-1-0776 ACL01 Workshop on Collocation

  2. Introduction Subjectivity: aspects of language used to express opinions and evaluations (Banfield 1982) Relevant for many NLP applications, such as information extraction and text categorization This paper: identifying collocational clues of subjectivity ACL01 Workshop on Collocation

  3. Outline • Subjectivity • Data and annotation • Unigram features • N-gram features • Generalized N-gram features • Document classification ACL01 Workshop on Collocation

  4. Subjectivity Tagging Recognizing opinions and evaluations (Subjective sentences) as opposed to material objectively presented as true (Objective sentences) Banfield 1982, Fludernik 1993, Wiebe 1994, Stein & Wright 1995 ACL01 Workshop on Collocation

  5. Examples At several different levels, it’s a fascinating tale. subjective Bell Industries Inc. increased its quarterly to 10 cents from 7 cents a share. objective ACL01 Workshop on Collocation

  6. Subjectivity ? “Enthused” “Wonderful!” “Great product” “Complained” “You Idiot!” “Terrible product” “Speculated” “Maybe” ACL01 Workshop on Collocation

  7. Examples Strong addressee-oriented negativeevaluation • Recognizing flames (Spertus 1997) • Personal e-mail filters (Kaufer 2000) I had in mind your facts, buddy, not hers. Nice touch. “Alleges” whenever facts posted are not in your persona of what is “real.” ACL01 Workshop on Collocation

  8. Examples Opinionated, editorial language • IR, text categorization (Kessler et al. 1997) • Do the writers purport to be objective? Look, this is a man who has great numbers. We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself. ACL01 Workshop on Collocation

  9. Examples Belief and speech reports • Information extraction, summarization, intellectual attribution (Teufel & Moens 2000) Northwest Airlines settled the remaining lawsuits, a federal judge said. “The cost of health care is eroding our standard of living and sapping industrial strength”, complains Walter Maher. ACL01 Workshop on Collocation

  10. Other Applications • Review mining (Terveen et al. 1997) • Clustering documents by ideology (Sack 1995) • Style in machine translation and generation (Hovy 1987) ACL01 Workshop on Collocation

  11. Potential Subjective Elements Sap: potential subjective element "The cost of health care is eroding standards of living and sapping industrial strength,” complains Walter Maher. Subjective element ACL01 Workshop on Collocation

  12. Subjectivity • Multiple types, sources, and targets Somehow grown-ups believed that wisdom adhered to youth. We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself. ACL01 Workshop on Collocation

  13. Annotations Manually tagged + existing annotations Three levels: expression level sentence level document level ACL01 Workshop on Collocation

  14. Expression Level Annotations [Perhaps you’ll forgive me] for reposting his response They promised [e+ 2 yet] more for [e+ 3 really good][e? 1 stuff] ACL01 Workshop on Collocation

  15. Expression Level Annotations Probably the most natural level Difficult for manual and automatic tagging: detailed no predetermined classification unit To date: used for training and bootstrapping ACL01 Workshop on Collocation

  16. Expression Level Data 1000 WSJ sentences (2J) 462 newsgroup messages (2J) 15413 words newsgroup data (1J) Single round of tagging; results promising Used to generate features, not for evaluation ACL01 Workshop on Collocation

  17. Sentence Level Annotations A sentence is labeled subjective if any significant expression of subjectivity appears “The cost of health care is eroding our standard of living and sapping industrial strength,’’ complains Walter Maher. “What an idiot,’’ the idiot presumably complained. ACL01 Workshop on Collocation

  18. Document Level Annotations This work: Opinion Pieces in the WSJ: editorials, letters to the editor, arts & leisure reviews Other work: flames 1-star to 5-star reviews + Free source of data + More directly related to applications ACL01 Workshop on Collocation

  19. Document Level Annotations Opinion pieces contain objective sentences Non-opinion pieces contain subjective sentences News reports present reactions (van Dijk 1988) “Critics claim …” “Supporters argue …” Editorials contain facts supporting the argument Reviews contain information about the product ACL01 Workshop on Collocation

  20. Noise Subjective sentences 43% Objective 57% Non-Opinion Pieces Class Proportions in WSJ Sample Opinion Pieces Noise Subjective sentences 70% Objective 30% ACL01 Workshop on Collocation

  21. of words are in opinion pieces 13-17% Word Distribution 83-87% of words are in non-opinion pieces ACL01 Workshop on Collocation

  22. Evaluation Metric for Feature Swith Respect to Opinion Pieces Precision(S) = # instances of S in opinions / total # instances of S Baseline for comparison # words in opinions / total # words Given the distributions, precisions of even perfect subjectivity clues would be low Improvement over baseline taken as evidence of promising PSEs ACL01 Workshop on Collocation

  23. Data Opinion Pieces Non-Opinion Pieces ACL01 Workshop on Collocation

  24. Document Level Data 3 WSJ editions, each more than 150K words Existing opinion-piece annotations used for training Manually refined classifications used for testing Identified editorials not marked as such 3 hours/edition Kappa = .93 for 2 judges ACL01 Workshop on Collocation

  25. Automatically Generated Unigram Features Adjective and verb features were generated using distributional similarity (Lin 1998, Wiebe 2000) Existing opinion-piece annotations used for training Manually refined annotations used for testing ACL01 Workshop on Collocation

  26. Unigram Feature Results WSJ-10 WSJ-33 baseline 17% baseline 13% +prec/freq +prec/freq Adjs +21/373 +09/2137 Verbs +16/721 +07/3193 ACL01 Workshop on Collocation

  27. Example Adjective Feature conclusive, undiminished, brute, amazing, unseen, draconian, insurmountable, unqualified, poetic, foxy, vintage, jaded, tropical, distributional, discernible, adept, paltry, warm, reprehensible, astonishing, surprising, commonplace, crooked, dreary, virtuoso, trashy, sandy, static, virulent, desolate, ours, proficient, noteworthy, Insistent, daring, unforgiving, agreeable, uncritical, homicidal, comforting, erotic, resonant, ephemeral, believable, epochal, dense, exotic, topical, … ACL01 Workshop on Collocation

  28. Unique Words hapax legomena More than expected single-instance words in subjective elements Unique-1-gram feature: all words that appear once in the test data Precision is 1.5 times baseline precision Frequent feature! ACL01 Workshop on Collocation

  29. Unigram Feature Results WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193 Unique-1-gram +10/6065 +06/6048 Results are consistent, even with different identification procedures (similarly for WSJ-22) ACL01 Workshop on Collocation

  30. Collocational PSEs get out what a for the last time just as well here we go again Started with the observation that low precision words often compose higher precision collocations ACL01 Workshop on Collocation

  31. Identifying Collocational PSEs Searching for 2-grams, 3-grams, 4-grams No grammatical generalizations or constraints yet Train on the data annotated with subjective elements (expression level) Test on the manually-refined opinion-piece data (document level) ACL01 Workshop on Collocation

  32. Identifying Collocational PSEs: Training Data (reminder) 1000 WSJ sentences (2J) 462 newsgroup messages (2J) 15413 words newsgroup data (1J) [Perhaps you’ll forgive me] for reposting his response They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff] ACL01 Workshop on Collocation

  33. N-Grams Each position is filled by a word POS pair in|prep the|det air|noun ACL01 Workshop on Collocation

  34. Identifying Collocational PSEs: Training, Step 1 Precision with respect to subjective elements calculated for all 1,2,3,4-grams in the training data Precision(n-gram) = # subjective instances of n-gram / total # instances of n-gram An instance of an n-gram is subjective if each word in the instance is in a subjective element ACL01 Workshop on Collocation

  35. Identifying Collocational PSEs: Training An instance of an n-gram is subjective if each word in the instance is in a subjective element [Perhaps you’ll forgive me] for reposting his response They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff] ACL01 Workshop on Collocation

  36. Identifying Collocational PSEs: Training, Step 2 N-gram PSEs selected based on their precisions, using two criteria: 1. Precision >= 0.1 2. Precision >= maximum precision of its constituents ACL01 Workshop on Collocation

  37. Identifying Collocational PSEs: Training, Step 2 Precision >= maximum precision of its constituents prec (w1,w2) >= max (prec (w1), prec (w2)) prec (w1,w2,w3) >= max(prec(w1,w2),prec(w3)) or prec (w1,w2,w3) >= max(prec(w1),prec(w2,w3)) ACL01 Workshop on Collocation

  38. Results WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193 Unique-1-gram +10/6065 +06/6048 2-grams +07/2182 +04/2080 3-grams +09/271 +06/262 4-grams +05/32 -03/30 ACL01 Workshop on Collocation

  39. Generalized Collocational PSEs Replace each single-instance word in the training data with “UNIQUE” Rerun the same training procedure, finding collocations such as highly|adverb UNIQUE|adj To test the new collocations on test data, first replace each single-instance word in the test data with “UNIQUE” ACL01 Workshop on Collocation

  40. Results WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193 Unique-1-gram +10/6065 +06/6048 2-grams +07/2182 +04/2080 3-grams +09/271 +06/262 4-grams +05/32 - 03/30 U-2-grams +24/294 +14/288 U-3-grams +27/132 +13/144 U-4-grams +83/3 +15/27 ACL01 Workshop on Collocation

  41. Example highly|adverb UNIQUE|adj highly unsatisfactory highly unorthodox highly talented highly conjectural highly erotic ACL01 Workshop on Collocation

  42. Example UNIQUE|verb out|IN farm out chuck out ruling out crowd out flesh out blot out spoken out luck out ACL01 Workshop on Collocation

  43. Examples UNIQUE|adj to|TO UNIQUE|verb impervious to reason strange to celebrate wise to temper they|pronoun are|verb UNIQUE|noun they are fools they are noncontenders UNIQUE|noun of|IN its|pronoun sum of its usurpation of its proprietor of its ACL01 Workshop on Collocation

  44. How do Fixed and U-Collocations Compare? Recall the original motivation for investigating fixed n-gram PSEs: Started with the observation that low precision words often compose higher precision collocations But unique words are probably not low precision Are we finding the same collocations two different ways? Or are we finding new PSEs? ACL01 Workshop on Collocation

  45. Comparison WSJ-10 2-grams 3-grams 4-grams Intersecting instances 4 2 0 %overlap 0.0016 0.0049 0 WSJ-33: all 0s ACL01 Workshop on Collocation

  46. Opinion-Piece Recognitionusing Linear Regression %correct TP FP Adjs,verbs .896 5 4 Ngrams .899 5 3 Adjs,verbs,ngrams .909 9 4 All features (+ max density) .912 11 4 Max density: the maximum feature count in an 11-word window ACL01 Workshop on Collocation

  47. Future Work Methods for recognizing non-compositional phrases (e.g., Lin 1999) Mutual bootstrapping (Rilof and Jones 1999) to alternatively recognize sequences and subjective fillers ACL01 Workshop on Collocation

  48. Sentence Classification Probabilistic classifier Binary Features: pronoun, adjective, number, modal ¬ “will “, adverb ¬ “not”, new paragraph Lexical feature: good for subj; good for obj; good for neither 10-fold cross validation; 51% baseline 72% average accuracy across folds 82% average accuracy on sentences rated certain ACL01 Workshop on Collocation

  49. C1 C2 C3 C4 C1 for all i C2 C3 C4 +1 = +2 = +3 = +4 = X1 X2 X3 X4 Test for Bias: Marginal Homogeneity 1+ = X1 2+ = X2 3+ = X3 4+ = X4 Worse the fit, greater the bias ACL01 Workshop on Collocation

  50. C1 C2 C3 C4 C1 C2 C3 C4 Test for Symmetric Disagreement: Quasi-Symmetry * * * Tests relationships among the off-diagonal counts * * * * * * * * * Better the fit, higher the correlation ACL01 Workshop on Collocation

More Related