Coarse-grained Word Sense Disambiguation

Coarse-grained Word Sense Disambiguation Jinying Chen, Martha Palmer March 25th, 2003

Outline • Motivation • Supervised Verb Sense Grouping • Unsupervised Verb Frameset Tagging • Future Work

Motivation • Fine-grained WSD are difficult for both human and machine and well-defined sense groups can alleviate this problem (Martha, Hoa, Christiane, 2002) • Potential application in Machine Translation • When building up a WSD corpus, the sense hierarchy can help annotators in sense tagging speed and accuracy (hopefully ? )

Outline • Motivation • Supervised Verb Sense Grouping • What’s VSG? • Using Semantic Features for VSG • Building Decision Tree for VSG • Experiment Results • Unsupervised Verb Frameset Tagging • Future Work

Frameset2 Frameset1 What’s VSG? verb sense group WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 verb sense

Using Semantic Features for VSG • PropBank • Each verb is defined by several framesets • All verb instances belonging to the same frameset share a common set of roles • Roles can be argn (n=0,1,…) and argM-f • Frameset is consistent with Verb Sense Group • Frameset tags and roles are semantic features for VSG

Building Decision Tree for VSG • Use c5.0 of DT • 3 Feature Sets: • SF (Simple Feature set) works best: • VOICE: PAS, ACT • FRAMESET: 01,02, … • ARGn (n=0,1,2 …) : 0(not occur), 1(occur) • CoreFrame: 01-ARG0-ARG1, 02-ARG0-ARG2,… • ARGM: 0(has not ARGM), 1(has ARGM) • ARGM-f(f=DIS, ADV, …): i (occur i times)

Experiment Results Table 2 Error rate of Decision Tree on five verbs

Discussion • Simple feature set and simple DT algorithms works well • Potential sparse data problem • Complicate DT algorithms (e.g. with boosting) tend to overfit the data • Complex features are not utilized by the model • Solution: use large corpus, e.g. parsed BNC corpus without frameset annotation

Outline • Task Description • Methodology • Unsupervised Verb Frameset Tagging • EM Clustering for Frameset Tagging • Features • Preliminary Experiment Results • Future Work

cluster c f1 f2 …… fm EM Clustering for Frameset Tagging • we treat a set of features extracted from the parsed sentences as observed variables and assume they are independent given a hidden variable, c: (1)

In the expectation step, we compute the probability of c conditioned on the set of observed features: (2) • In the maximization step, we re-compute and by maximizing the log-likelihood of all of the observed data. • Repeat the Expectation and Maximization steps for a fixed number of rounds or until the change of the probability parameters and is under a threshold.

To do clustering, we compute for each verb instance with the same formula as in (2) and assign this instance to the cluster that has the maximal . • To evaluate we count the majority of the instances in a single cluster which have the same gold-standard Frameset. Other instances not in the majority of a cluster are treated as misclassified.

Features • WordNet classes for Subject: Person, Animate, State, Event, … • WordNet classes for Object • Passivization: 0, 1 • Transitivity: 0, 1 • PP as adjuncts: location, direction, beneficiary … • Double objects: 0, 1 • Clausal complements: 0, 1

Preliminary Experiment Results Table 3 Accuracy of EM clustering on five verbs

Outline • Task Description • Methodology • Unsupervised Verb Frameset Tagging • Future Work

Future Work • To improve current model by • Refine Subcategorization Extraction • Use More Features Example: a. He has to live with this programming work. (live 02 endure) b. He lived with his relatives. (live 01 inhabit) • To cluster nouns automatically instead of using WordNet to group nouns

Thanks!

Table 4 lower bound on Decision Tree error rate

Table 5 Error rate of DT with different feature sets

Table 6 Accuracy of EM clustering on five verbs

What’s VSG? • Aggregate the senses of a verb into several groups according to their similarities Example: Learn GROUP 1: WN1, WN3 (acquire a skill) GROUP 2: WN2, WN6 (find out) SINGLETON: WN4 (be a student) SINGLETON: WN5 (teach) WordNet Meaning (simplified): 1. acquire or gain knowledge or skills -- ("She learned dancing”) 2. hear, find out -- ("I learned that she has two grown-up children“) 3. memorize, con -- (commit to memory; learn by heart) 4. study, read, take -- (be a student of a certain subject; "She is learning for the bar exam") 5. teach, learn, instruct -- ("I learned them French") 6. determine, find out -- ("I want to learn whether she speaks French“)

Table 7 Portuguese and German translations of develop

Coarse-grained Word Sense Disambiguation

Coarse-grained Word Sense Disambiguation

Presentation Transcript

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Relations and Word Sense Disambiguation

Collective Word Sense Disambiguation

Coarse to Fine Grained Sense Disambiguation in Wikipedia

Word Sense Disambiguation (WSD)

Word Sense Disambiguation

Word Relations and Word Sense Disambiguation

Word Sense Disambiguation

Unsupervised Word Sense Disambiguation

Word Sense Disambiguation in Queries

Fine-grained and Coarse-grained Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation