Predicting the Semantic Orientation of Adjectives

Predicting the Semantic Orientation of Adjectives Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Gabriel Nicolae

Introduction • Orientation/polarity = direction of deviation from the norm Nearly synonymous simple vs. simplistic Antonyms hot vs. cold

Introduction • In linguistic constructs such as conjunctions the choice of arguments and connectives are mutually constrained. The tax proposal was simple and well-received simplistic but well-received simplistic and well-received by the public.

Exceptions

Goals • Automatically identify antonyms • Distinguish near synonyms How? • by retrieving semantic orientation information using indirect information collected from a large corpus Why? • dictionaries and similar sources (thesauri, WordNet) do not include explicitly semantic orientation information • lack of links between antonyms and synonyms when they depend on the domain of the discourse

Overview of their approach • Correlation between indicators and semantic orientation • direct indicators: affixes (in-, un-) • mostly negatives • exceptions: independent, unbiased • indirect indicators: conjunctions • conjoined adjectives usually are of the same orientation for most connectives • the situation is reversed for but fair and legitimate corrupt and brutal fair and brutal corrupt and legitimate vs. from corpus semantically anomalous

General algorithm • Extract conjunctions of adjectives and morphological relations • Label each two conjoined adjectives as being of the same or different orientation using a log-linear regression model • Separate adjectives into two subsets of different orientation using a clustering algorithm • The group with the higher average frequency is labeled as positive

Data collection • Corpus: 21 million word 1987 Wall Street Journal • Training data: a set of adjectives with predetermined (hand-annotated) orientation labels (+ or -) • 1,336 adjectives (657 +, 679 -) • The training set was validated by four other people • 500 adjectives: 89.15% agreement • Test data: • 15,048 conjunction tokens • 9,296 distinct pairs of conjoined adjectives (type)

Data collection (cont.) • Each conjunction token is classified according to three variables: • conjunction used • and, or, but, either-or, neither-nor • type of modification • attributive, predicative, appositive, resultative • number of the modified noun • singular, plural

Validation of the conjunction hypothesis Results • Their conjunction hypothesis is validated overall and for almost all individual cases • There are small differences in the behavior of conjunctions between linguistic environments (as represented by the three attributes) • Conjoint antonyms appear far more frequently than expected by chance in conjunctions other than but

Prediction of link type • Baseline 1: always guessing that a link is of the same orientation type => 77.84% accuracy • Baseline 2: Baseline 1 + but exhibits the opposite pattern => 80.82% accuracy • Morphological relationships: • Adjectives related in form almost always have different semantic orientations • Highly accurate (97.06%), but applies only to 1,336 labeled adjectives (891,780 possible pairs) • E.g. adequate-inadequate, thoughtful-thoughtless • Baseline 1 + Morphology => 78.86% accuracy • Baseline 2 + Morphology => 81.75% accuracy

Prediction of link type (cont.) • Log-linear regression model x: the vector of the observed counts in the various conjunction categories w: the vector of weights to be learned y: the response of the system • Using the method of iterative stepwise refinement they selected 9 predictor variables from all 90 possible predictor variables. • Small improvement: 80.97% accuracy (82.05% accuracy using Morphology) but now each prediction is rated between 0 and 1

Clustering • Input: a graph of adjectives connected by dissimilarity links • Small dissimilarity value => same-orientation link • High dissimilarity value => different-orientation link • Method used: apply an iterative optimization procedure on each connected component, based on the exchange method, a non-hierarchical clustering algorithm • Idea: find the partition Psuch that the objective functionΦ is minimized

Labeling the clusters as + or - • In oppositions of gradable adjectives where one member is semantically unmarked, the unmarked member is the most frequent one about 81% of the time • Unmarked => positive orientation almost always • So, label as positive the group that has the highest average frequency of words.

Graph connectivity and performance • They tested how graph connectivityaffects the overall performance

Predicting the Semantic Orientation of Adjectives

Predicting the Semantic Orientation of Adjectives

Presentation Transcript

The Plurals of Adjectives

The Wacky World of Adjectives

Predicting the Direction of Reaction

THE COMPARISON OF ADJECTIVES

Predicting the Semantic Orientation of Adjective

The comparison of adjectives.

Acquisition of Semantic Classes for Adjectives from Distributional Evidence

Comparison of Adjectives

Document-level Semantic Orientation and Argumentation

Predicting the Onset of AIDS

The Agreement of Adjectives

THE COMPARISON OF ADJECTIVES

THE COMPARISON OF ADJECTIVES

Predicting the Onset of AIDS

Predicting the distribution of maerl

Adjectives - List of Adjectives, Adjectives in English | GrammarCollege.com

Predicting the Onset of AIDS

Document-level Semantic Orientation and Argumentation

Predicting the Traits of Offspring

THE COMPARISON OF ADJECTIVES