110 likes | 255 Views
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. 1998. 12. 10. Oh-Woog Kwon KLE Lab. CSE POSTECH. Introduction. An Unsupervised Algorithm for WSD Avoids the need for costly hand-tagged training data Using two powerful properties of human language
E N D
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods 1998. 12. 10. Oh-Woog Kwon KLE Lab. CSE POSTECH
Introduction • An Unsupervised Algorithm for WSD • Avoids the need for costly hand-tagged training data • Using two powerful properties of human language 1. One sense per collocation (dictionary definition of collocation): 2. One sense per discourse: 동물의 눈은 물체를 보는 기관이다. Only one sense (eye), not two sense (eye or snow) …. …. bank Text:101 Same sense …. …. bank …. …. bank
One Sense Per Discourse • A Test for One Sense Per Discourse • Table of pp. 189 (using 37,232 hand-tagged examples) • Accuracy: discourse에서 같은 단어는 같은 의미로 사용되나? (99.8%) • Applicability: 한 discourse에서 두 번 이상 나타나는가? (50.1%) • Advantage of One Sense Per Discourse • Conjunction with separate models of local context for each word Local context of bank = …. …. bank Text:101 …. …. … bank … + …. …. … bank +
One Sense Per Collocation • The Type of Collocation (predictive degree) • Immediately adjacent collocations > collocations with distance • At equivalent distance, predicate-argument relationship > arbitrary associations • Collocations with content words > collocations with function words adjacent content words can disambiguate word sense. • A Supervised Algorithm based on Above Property • Decision List Algorithm [Yarowsky, ACL94] • Accent Restoration in Spanish and French • Be used as a component of the proposed unsupervised algorithm
Decision List Algorithm Step 1: Identify the Ambiguities in the Target Word ex) 눈 : eye, snow Step 2: Collect Training Context, for Each Sense ex) eye : … 사람의 눈은 좋은 …, 곤충의 눈은 머리에 …, … … snow: … 하늘에서 눈이 내리고 …, … 어제 눈이 내려 …, … … Step 3: Measure Collocational Distribution ex) -1 w [사람 눈] : eye (1,000), snow (0) k w [하늘 within k words] : eye (2), snow (10,000) Step 4: Sort by Log-Likelihood into Decision Lists Step 5: Optional Pruning and Interpolation Step 6: Train Decision Lists for General Classes of Ambiguity Step 7: Classification using Decision Lists Using only the single most reliable collocation matched in the target context
Unsupervised Learning Algorithm - 1 • Illustrated by the disambiguation of 7,538 instances of plant • STEP 1: • Collect contexts in untagged training set (right column of pp. 190) • STEP 2: a) Choose a small number of seed collocations of each sense b) Tagging all training examples containing the seed collocates with seed’s sense label => two seed sets (left column of pp. 191, Figure 1) • Options for Training Seeds • Use words in dictionary definitions • Use a single defining collocate for each class (using thesaurus(WordNet)) • Label salient corpus collocates (not fully automatic): • use of words that co-occur with the target word • a human judge decide which one
Unsupervised Learning Algorithm - 2 • STEP 3: (pp. 192, Figure 2) a) Train the supervised classification algorithm on two seed sets b) Classify the entire sample set using the resulting classifier of (a) Add examples with probability above a threshold to the seed sets c) Using one-sense-per-discourse constraint (option) • Detect the dominate sense for each discourse (using threshold). • Augmentation: If the dominate sense exists, add previously untagged contexts to the seed set of the dominate sense • Filtering: Otherwise, return all instances in the discourse (where there is substantial disagreement for the dominate sense) to the residual set. d) Repeat Step 3. • Can escape from initial misclassification • Two techniques to avoid a local minimum • incrementally increasing the width of the context window periodically • randomly perturbing the class-inclusion threshold, similar to simulated annealing.
Unsupervised Learning Algorithm - 3 • STEP 4: Stop, when converging on a stable residual set. • STEP 5: Classify new data using final decision lists • For error correction, optionally use one-sense-per-discourse constraint.
Evaluation • The test data • extracted from a 460 million word corpus • the type of data: news article, scientific abstracts, spoken transcripts, and novels used in the previous researches. • Comparison System (see Table in pp. 194) • (5) : using supervised algorithm • (6) : using only two words as seeds • (7) : using the salient words of a dictionary definition as seeds • (8) : using quick hand tagging of a list of algorithmically-identified salient collocates • (9) : (7) + using one-sense-per-discourse only in classification procedure • (10) : (9) + using one-sense-per-discourse in the learning
Conclusion • Unsupervised Word Sense Disambiguation Rivaling Supervised Methods