1 / 15

Predicting the Semantic Orientation of Adjectives

Predicting the Semantic Orientation of Adjectives. Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Gabriel Nicolae. Introduction. Orientation/polarity = direction of deviation from the norm. Nearly synonymous simple vs. simplistic. Antonyms hot vs. cold. Introduction.

Download Presentation

Predicting the Semantic Orientation of Adjectives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting the Semantic Orientation of Adjectives Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Gabriel Nicolae

  2. Introduction • Orientation/polarity = direction of deviation from the norm Nearly synonymous simple vs. simplistic Antonyms hot vs. cold

  3. Introduction • In linguistic constructs such as conjunctions the choice of arguments and connectives are mutually constrained. The tax proposal was simple and well-received simplistic but well-received simplistic and well-received by the public.

  4. Exceptions

  5. Goals • Automatically identify antonyms • Distinguish near synonyms How? • by retrieving semantic orientation information using indirect information collected from a large corpus Why? • dictionaries and similar sources (thesauri, WordNet) do not include explicitly semantic orientation information • lack of links between antonyms and synonyms when they depend on the domain of the discourse

  6. Overview of their approach • Correlation between indicators and semantic orientation • direct indicators: affixes (in-, un-) • mostly negatives • exceptions: independent, unbiased • indirect indicators: conjunctions • conjoined adjectives usually are of the same orientation for most connectives • the situation is reversed for but fair and legitimate corrupt and brutal fair and brutal corrupt and legitimate vs. from corpus semantically anomalous

  7. General algorithm • Extract conjunctions of adjectives and morphological relations • Label each two conjoined adjectives as being of the same or different orientation using a log-linear regression model • Separate adjectives into two subsets of different orientation using a clustering algorithm • The group with the higher average frequency is labeled as positive

  8. Data collection • Corpus: 21 million word 1987 Wall Street Journal • Training data: a set of adjectives with predetermined (hand-annotated) orientation labels (+ or -) • 1,336 adjectives (657 +, 679 -) • The training set was validated by four other people • 500 adjectives: 89.15% agreement • Test data: • 15,048 conjunction tokens • 9,296 distinct pairs of conjoined adjectives (type)

  9. Data collection (cont.) • Each conjunction token is classified according to three variables: • conjunction used • and, or, but, either-or, neither-nor • type of modification • attributive, predicative, appositive, resultative • number of the modified noun • singular, plural

  10. Validation of the conjunction hypothesis Results • Their conjunction hypothesis is validated overall and for almost all individual cases • There are small differences in the behavior of conjunctions between linguistic environments (as represented by the three attributes) • Conjoint antonyms appear far more frequently than expected by chance in conjunctions other than but

  11. Prediction of link type • Baseline 1: always guessing that a link is of the same orientation type => 77.84% accuracy • Baseline 2: Baseline 1 + but exhibits the opposite pattern => 80.82% accuracy • Morphological relationships: • Adjectives related in form almost always have different semantic orientations • Highly accurate (97.06%), but applies only to 1,336 labeled adjectives (891,780 possible pairs) • E.g. adequate-inadequate, thoughtful-thoughtless • Baseline 1 + Morphology => 78.86% accuracy • Baseline 2 + Morphology => 81.75% accuracy

  12. Prediction of link type (cont.) • Log-linear regression model x: the vector of the observed counts in the various conjunction categories w: the vector of weights to be learned y: the response of the system • Using the method of iterative stepwise refinement they selected 9 predictor variables from all 90 possible predictor variables. • Small improvement: 80.97% accuracy (82.05% accuracy using Morphology) but now each prediction is rated between 0 and 1

  13. Clustering • Input: a graph of adjectives connected by dissimilarity links • Small dissimilarity value => same-orientation link • High dissimilarity value => different-orientation link • Method used: apply an iterative optimization procedure on each connected component, based on the exchange method, a non-hierarchical clustering algorithm • Idea: find the partition Psuch that the objective functionΦ is minimized

  14. Labeling the clusters as + or - • In oppositions of gradable adjectives where one member is semantically unmarked, the unmarked member is the most frequent one about 81% of the time • Unmarked => positive orientation almost always • So, label as positive the group that has the highest average frequency of words.

  15. Graph connectivity and performance • They tested how graph connectivityaffects the overall performance

More Related