Distributional clustering of English words

Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu

Introduction • Method for automatic clustering of words • Distribution in particular syntactic contexts • Deterministic annealing • Find lowest distortion sets of clusters • Increasing annealing parameters • Clusters subdivide – hierarchical “soft” clustering • Clusters • Class models • Word co-occurrence

Introduction • Simple tabulation of frequencies • Data sparseness • Hindle proposed smoothing based on clustering • Estimating likelihood of unseen events from the frequencies of “similar” events that have been seen • Example: estimating the likelihood of a particular direct object for a verb from the likelihood of that direct object for similar verbs

Introduction • Hindle’s proposal • Words are similar if there is strong statistical evidence that they tend to participate in the same events • This paper • Factor word association tendencies into associations of words to certain hidden classes and association between classes themselves • Derive classes directly from data

Introduction • Classes • Probabilistic concepts or clusters c • p(c|w) for each word w • Different than classical “hard” Boolean classes • Thus, this method is more robust • Is not strongly affected by errors in frequency counts • Problem in this paper • 2 word classes: V and N • Relation between a transitive main verb and the head noun of the direct object

Problem • Raw knowledge: • fvn– frequency of occurrence of a particular pair (v,n) in the training corpus • Unsmoothed probability - conditional density: • pn(v) = • This is p(v|n) • Problem • How to use pn to classify the nN

Methodology • Measure of similarity between distributions • Kullback-Leibler distance • This problem • Unsupervised learning – leardn underlying distribution of data • Objects have no internal structure, the only info. – statistics about joint appearance (kind of supervised learning)

Distributional Clustering • Goal – find clusters such that pn(v) is approximated by: • Solve by EM

Hierarchical clustering • Deterministic annealing • Sequence of phase transitions • Increasing the parameter β • Local influence of each noun on the definition of centroids

Results

Evaluation • Relative entropy • Where tn is the relative frequency distribution of verbs taking n as direct object in the test set

Evaluation • Check if the model can disambiguate between two verbs, v and v’

Distributional clustering of English words

Distributional clustering of English words

Presentation Transcript

Distributional models

Multi-way Distributional Clustering via Pairwise Interactions

English Lexicology Morphological Structure of English Words

On feature distributional clustering for text categorization

English Lexicology Morphological Structure of English Words

Some English words of Russian origin

Teaching English Words

Teaching English Words

What percentage of English words come from Greek words?

English IV Vocabulary Words

Distributional Consequences of Globalization

Distributional Clustering of English Words

MORPHOLOGICAL STRUCTURE OF ENGLISH WORDS

English Food Vocab Words

On feature distributional clustering for text categorization

Distributional Clustering of Words for Text Classification

Distributional Effects of Globalization

English Lexicology Classification of English Words

Morphological structure of English words (WORDS) Lecture # 3

Distributional clustering of English words

English Christmas words

Root Words in English | List of Root Words | Englishleap.com