230 likes | 400 Views
Coarse-grained Word Sense Disambiguation. Jinying Chen, Martha Palmer March 25 th , 2003. Outline. Motivation Supervised Verb Sense Grouping Unsupervised Verb Frameset Tagging Future Work. Motivation.
E N D
Coarse-grained Word Sense Disambiguation Jinying Chen, Martha Palmer March 25th, 2003
Outline • Motivation • Supervised Verb Sense Grouping • Unsupervised Verb Frameset Tagging • Future Work
Motivation • Fine-grained WSD are difficult for both human and machine and well-defined sense groups can alleviate this problem (Martha, Hoa, Christiane, 2002) • Potential application in Machine Translation • When building up a WSD corpus, the sense hierarchy can help annotators in sense tagging speed and accuracy (hopefully ? )
Outline • Motivation • Supervised Verb Sense Grouping • What’s VSG? • Using Semantic Features for VSG • Building Decision Tree for VSG • Experiment Results • Unsupervised Verb Frameset Tagging • Future Work
Frameset2 Frameset1 What’s VSG? verb sense group WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 verb sense
Using Semantic Features for VSG • PropBank • Each verb is defined by several framesets • All verb instances belonging to the same frameset share a common set of roles • Roles can be argn (n=0,1,…) and argM-f • Frameset is consistent with Verb Sense Group • Frameset tags and roles are semantic features for VSG
Building Decision Tree for VSG • Use c5.0 of DT • 3 Feature Sets: • SF (Simple Feature set) works best: • VOICE: PAS, ACT • FRAMESET: 01,02, … • ARGn (n=0,1,2 …) : 0(not occur), 1(occur) • CoreFrame: 01-ARG0-ARG1, 02-ARG0-ARG2,… • ARGM: 0(has not ARGM), 1(has ARGM) • ARGM-f(f=DIS, ADV, …): i (occur i times)
Experiment Results Table 2 Error rate of Decision Tree on five verbs
Discussion • Simple feature set and simple DT algorithms works well • Potential sparse data problem • Complicate DT algorithms (e.g. with boosting) tend to overfit the data • Complex features are not utilized by the model • Solution: use large corpus, e.g. parsed BNC corpus without frameset annotation
Outline • Task Description • Methodology • Unsupervised Verb Frameset Tagging • EM Clustering for Frameset Tagging • Features • Preliminary Experiment Results • Future Work
cluster c f1 f2 …… fm EM Clustering for Frameset Tagging • we treat a set of features extracted from the parsed sentences as observed variables and assume they are independent given a hidden variable, c: (1)
In the expectation step, we compute the probability of c conditioned on the set of observed features: (2) • In the maximization step, we re-compute and by maximizing the log-likelihood of all of the observed data. • Repeat the Expectation and Maximization steps for a fixed number of rounds or until the change of the probability parameters and is under a threshold.
To do clustering, we compute for each verb instance with the same formula as in (2) and assign this instance to the cluster that has the maximal . • To evaluate we count the majority of the instances in a single cluster which have the same gold-standard Frameset. Other instances not in the majority of a cluster are treated as misclassified.
Features • WordNet classes for Subject: Person, Animate, State, Event, … • WordNet classes for Object • Passivization: 0, 1 • Transitivity: 0, 1 • PP as adjuncts: location, direction, beneficiary … • Double objects: 0, 1 • Clausal complements: 0, 1
Preliminary Experiment Results Table 3 Accuracy of EM clustering on five verbs
Outline • Task Description • Methodology • Unsupervised Verb Frameset Tagging • Future Work
Future Work • To improve current model by • Refine Subcategorization Extraction • Use More Features Example: a. He has to live with this programming work. (live 02 endure) b. He lived with his relatives. (live 01 inhabit) • To cluster nouns automatically instead of using WordNet to group nouns
What’s VSG? • Aggregate the senses of a verb into several groups according to their similarities Example: Learn GROUP 1: WN1, WN3 (acquire a skill) GROUP 2: WN2, WN6 (find out) SINGLETON: WN4 (be a student) SINGLETON: WN5 (teach) WordNet Meaning (simplified): 1. acquire or gain knowledge or skills -- ("She learned dancing”) 2. hear, find out -- ("I learned that she has two grown-up children“) 3. memorize, con -- (commit to memory; learn by heart) 4. study, read, take -- (be a student of a certain subject; "She is learning for the bar exam") 5. teach, learn, instruct -- ("I learned them French") 6. determine, find out -- ("I want to learn whether she speaks French“)