Computing Word-Pair Antonymy

Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φGraeme Hirst *Univ. of Maryland φUniv. of Toronto EMNLP 2008

Introduction • Antonymy: pair of semantically contrasting words. • Ex: Strongly antonymous: HotCold Semantically contrasting: EnemyFan Not antonymous: PenguinClown

Usage • Detecting contradictions • Detecting humor • Automatic creation of thesaurus

Problem Definition • Given a thesaurus, find out the antonymous category pairs. • Assign the degree of antonymy to each pair of antonymous categories.

Hypothesis(1) • The Co-occurrence Hypothesis of Antonyms • Antonymous word pairs occur together much more often than other word pairs.

Hypothesis(1) • Empirical proof: • 1,000 antonymous pairs from Wordnet • 1,000 randomly generated word pairs • Use BNC as corpus, set window size 5. • Calculate the MI for each word pairs and average it

Hypothesis(2) • The Distributional Hypothesis of Antonyms • Antonyms occur in similar contexts more often than non-antonymous words • Ex work: activity of doing job play: activity of relaxation

Hypothesis(2) • Empirical proof: • Use the same set of word pairs in hypothesis(1) • Calculate the distributional distance between their categories

Distributional Distancebetween Two Thesaurus Categories c1,c2: thesaurus category I(x,y):pointwise mutual information between x and y T(c):the set of all words w such that I(c,w)>0

Method • Determine pairs of thesaurus categories that are contrasting in meaning • Use the co-occurrence and distributional hypotheses to determine the degree of antonymy of word pairs

Method • 16 affix rules were applied to Macquarie Thesaurus • 2,734 word pairs were generated as a seed set. • Exceptions: sectXinsect • Relatively few

Method • 10,807 pairs of semantically contrasting word pairs from WordNet

Method • If any word in thesaurus category C1 is antonymous to any word in category C2 as per a seed antonym pair, then the two categories are marked as contrasting. • If no word in C1 is antonymous to any word in C2, then the categories are considered not contrasting

Method • Degree of antonymy----category level • By distributional hypothesis of antonyms, we claim that the degree of antonymy between two contrasting thesaurus categories is directly proportional to the distributional closeness of the two concepts

Method • Degree of antonymy----word level • target words belong to the same thesaurus paragraphs as any of the seed antonyms linking the two contrasting categories highly antonymous • target words do not both belong to the same paragraphs as a seed antonym pair, but occur in contrasting categories  medium antonymous • target words with low tendency to co-occur lowly antonymous

Method • Adjacency Heuristic • Most thesauri are ordered such that contrasting categories tend to be adjacent

Evaluation • 1,112 Closest-opposite questions designed to prepare students for GRE(Graduate Record Examination) • 162 questions as the development set • 950 questions as the test set

Evaluation • Closest-opposite questions • Ex: adulterate: a. renounce b. forbid c. purify d. criticize e. correct

Evaluation • Closest-opposite questions • Ex: adulterate: a. renounce b. forbid c. purify d. criticize e. correct 摻雜的聲明放棄禁止純淨的批評正確

Evaluation

Discussion • The automatic approach does indeed mimic human intuitions of antonymy. • In languages without a wordnet, substantial accuracies may be achieved. • Wordnet and affix-generated seed are complementary.

Conclusion • Proposed an empirical approach to antonymy that combines corpus co-occurrence statistics with the structure of a thesaurus. • The system can identify the degree of antonymybetween word pairs. • An empirical proof that antonym pairs tend to be used in similar contexts.

Thanks

Computing Word-Pair Antonymy

Computing Word-Pair Antonymy

Presentation Transcript

Pair Programming

Twisted Pair

Antonymy and Conceptual Vectors

Pair programming

Antonymy ( 反意性 )

Pair Share

Pair Programming

Pair Programming

TWISTED PAIR

Pair Work

CGP2P Calcul Global Pair à Pair

Closest Pair

Fifth Pair 

Pair Production

Eighth Pair §

Multiple Word DNA Computing on Surfaces

RE-Pair

pair

Pair Programming

Pair, Share!

Gene Pair