80 likes | 99 Views
Automatic Verb Sense Grouping --- Term Project Proposal for CIS630. Jinying Chen 10/28/2002. Motivation. “Making fine-grained and coarse-grained distinction, both manually and automatically (Martha, Hoa, Christiane, 2002)
E N D
Automatic Verb Sense Grouping --- Term Project Proposal for CIS630 Jinying Chen 10/28/2002
Motivation • “Making fine-grained and coarse-grained distinction, both manually and automatically (Martha, Hoa, Christiane, 2002) • The difficulty of finding consistent criteria for making fine-grained sense distinction, either manually or automatically • Well-defined sense groups can alleviate this problem • Potential application in Machine Translation
Model • Unsupervised Learning • EM algorithm (similar as in Dan Gildea 2002, Walde 2000, Rooth 1999, Ted Pedersen, 1997)
EM clustering algorithm • Soft clustering P(v|c) • Each verb vi is associated with a set of features {fi1, fi2, … fin}, there are m clusters {c1 , c2, … cm} • Estimate P(v|c) by maximize loglikelihood
Two problems • How many clusters for a particular verb? • human knowledge of the rough number of verb sense groups is instructive in unsupervised learning • Olga’s proposal • How many features for a particular verb? • May not be a problem: hopefully the EM algorithm can do feature selection on some degree • However, a well-restricted feature set can reduce the model complexity (O(nm)) and alleviate the effect of noise data • Borrow ideas from “Automatic Verb Classification based on Statistical Distribution of Argument Structure” (Paola Merlo and Suzanne Stevenson, 2001)
Plan • Phase I --- Corpus analysis • Automatically and manually • Determine the range of feature set for each verb • Phase II --- Automatic verb sense grouping • Implement EM clustering algorithm • Evaluate the performance • Phase III --- Compare with other clustering methods • Ward’s minimum-variance method (Ward, 1963) • McQuitty’s similarity analysis (McQuitty, 1966) • Spectral Clustering (Brew & Walde, 2002)