Authors: Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Marian Olteanu

Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Marian Olteanu

Introduction • Group adjectives according to their meaning • Semantic relateness – between adjectives which describe the same property • Goal • Adjectival scales • Method • Statistical • Augmented with linguistic information derived from the corpus

Adjectival scales • Linguistic scale – set of words of the same grammatical category that can be ordered by their semantic strength or degree of informativeness • Example: lukewarm, warm, hot • Adjectives – elements on the scale can be partitioned into 2 groups, in each group – total order • Negative and positive degrees

Adjectival scales • Tests for acceptance • Horn: “x even y” • Data sparseness – infrequent patterns in real corpora • Scales vary accros domains

Methodology • Four stages • Extract linguistic data from the parsed corpus – word pairs • Info processed by morphological component – group together similar pairs • Independent similarity modules – number between 0 and 1

Methodology • Four stages (cont) • Module that combines all the similarity measures into one dissimilarity measure • Module that clusters adjectives into groups based on dissimilarity measure • Linguistic data • That tell if adjectives are related – adj.-noun pairs • That tell if adjectives are unrelated – adj.-adj. pairs

Methodology • Adj.-noun pairs • Distribution of nouns and adjective modifiers • Expectation: similar adjectives tend to modify the same set of nouns • Adj.-adj. pairs • Adjectives that describe the same property do not appear in the same minimal NP • Antithetical: hot cold, red black • Non-antithetical: hot warm • Adj. that modifies each other: light blue shirt

Computing similarity between adjectives • Adjective-noun pairs • Robust non-parametric method – Kendall’sτ coefficient for two random variables with paired observations • (Xi,Yi) and (Xj,Yj) – two pairs of observations for adj. X and Y on the nouns I and j • Concordant if Xi>Xj and Yi>Yj or Xi<Xj and Yi<Yj • Discordant, if Xi>Xj and Yi<Yj or Xi<Xj and Yi>Yj • τ = pc-pd • Unbiased estimator:

Methodology • Adjective-adjective pairs • Reject pairs that occur in the same NP • High accuracy, low coverage • Combining similarity estimates • If pair was rejected by adj.-adj. module: dissimilarity = k (usually 10) • Else, dissimilarity = 1 – (similarity by adj.-noun module)

Clustering the adjectives • Goal – optimal partition • Algorithm • Non-hierarchical • Number of partitions – input parameter • Exchange method • K-means is not applicable • Minimizing the objective function Φ

Clustering the adjectives • Algorithm (cont.) • Random partition • Compute the improvement by moving an adjective to a different cluster • Hill-climbing • Local minima • Call the algorithm multiple times with different starting positions

Results

Results • Clusters #5 and #8 – adjectives that indicate size • Clustering discourages large clusters • Cluster #6: 5 words • Methods to increase number of pairs • Larger corpus • More syntactical patterns

Evaluation • Evaluation • 9 human judges • manually created partitions (6 to 11 clusters) • “Cross-validation” for human judges: 49% to 59% for F-measure

Evaluation • Lower bound • Monte Carlo analysis • F-measure: 1 in 20,000 trials • Fallout: 4.9%

Authors: Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Marian Olteanu

Authors: Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Marian Olteanu

Presentation Transcript

Louisiana Authors

Kathleen Brehony, Ph.D.

College Connection

BRIEF Authors

Invitation to the Life Span by Kathleen Stassen Berger

Teen network

Insert: Presenter Name

The Developing Person Through the Life Span 8e by Kathleen Stassen Berger

Office of Risk Management

DNS and the Web

TCP

Natural Language Processing for the Web

Visualizing Data using t-SNE

Probabilistic Information Retrieval Approach for Ranking of Database Query Results

Authors : Christian Bohm, Alexey Pryakhin, Matthias Schubert Published in : ICDE 2006

IRB Review of Device Research and Other Clinical Uses of Devices

Marian Apparitions

Marian Catholic High School 700 Ashland Avenue Chicago Heights, Illinois 60411 (708) 755-8286

Abstractions for Network Update

Faculty/Presenter Disclosure