240 likes | 361 Views
IJCNLP2008 Jan 10, 2008. Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition. Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling University of Sussex. Word Sense Disambiguation. Predominant sense acquisition
E N D
IJCNLP2008 Jan 10, 2008 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling University of Sussex
Word Sense Disambiguation • Predominant sense acquisition • Exploited as a powerful back-off strategy for word sense disambiguation • McCarthy et al (2004): • Achieved 64% precision on Senseval2 all-words task • Strongly relies on linguistic resources such as WordNet for calculating the semantic similarity Difficulty: porting it to other languages
Focus • How to calculate the semantic similarity score without semantic relations such as hyponym • Explore the potential use of the word definitions (glosses) instead of WordNet-style resources for porting McCarthy et al.’s method to other languages
Table of contents • Task • Related work: McCarthy et al (2004) • Gloss-based semantic similarity metrics • Experiments • WSD on the two datasets: EDR and Japanese Senseval2 task • Conclusion and future directions
Word Sense Disambiguation (WSD) task • select the correct sense of the word appearing in the context I ate fried chicken last Sunday. • Supervised approaches have been mainly applied to learn the context
Word Sense Disambiguation (WSD) task (Cont’d) • Estimate the most predominant sense of a word regardless of its context • English coarse-grained all words task (2007) • Choosing most frequent senses: 78.9% • Best performing system: 82.5% • Systems using a first sense heuristic have relied on sense-tagged data • However, sense-tagged data is expensive
McCarthy et al. (2004)’s unsupervised approach • Extract top N neighbour words of the target word according to the distributional similarity score (simds) • Calculate the prevalent score of each sense • Calculate simds weighted by the semantic similarity score (simss) • Sum up all the weighted simdsof top N neighbours • Semantic similarity: estimated from linguistic resources (e.g. WordNet) • Output the sense which has the maximum prevalent score
simss(word, sense2) weighted simds 0.15 0.0271 0.20 0.0365 = ... ... 0.10 0.0157 semantic similarity score (from WordNet) McCarthy et al. (2004)’s approach: An example chicken sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. distributional similarity score prevalence(sense2) = 0.0271 + 0.0365 + ... + 0.0157 = 0.152
McCarthy et al. (2004)’s approach: An example chicken sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. = prevalence(sense2) = 0.152 prevalence(sense3) = 0.0018 + 0.0037 + ... + 0.0016 = 0.023 prevalence(sense2) > prevalence(sense3) predominant sense: sense2
Problem • While the McCarthy et al.’s method works well for English, other inventories do no always have WordNet-style resources to tie the nearest neighbors to the sense inventory • While traditional dictionaries do not organise senses into synsets, they do typically have sense definitions (glosses) associated with the senses
Gloss-basedsimilarity • Calculate similarity between two glosses in a dictionary as semantic similarity • simlesk: simply calculate the overlap of the content words in the glosses of the two word senses • simDSlesk: use distributional similarity as an approximation of semantic distance between the words in the two glosses
lesk: Example • simlesk(chicken, turkey) = 2 • “meat” and “food” are overlapped in two glosses
lesk: Example • simlesk(chicken, tomato) = 0 • No overlap in two glosses
DSlesk • Calculate distributional similarity scores of any pairs of nouns in two glosses simds(meat, fruit) = 0.1625, simds(meat, vegetable) = 0.1843,simds(bird, fruit) = 0.1001, simds(bird, vegetable) = 0.0717, simds(food, fruit) = 0.1857, simds(food, vegetable) = 0.1772 • Output the average of the maximum distributional similarity of all the nouns in target word simDSlesk (chicken, tomato)= 1/3 (0.1843 + 0.1001 + 0.1857) = 0.1557
DSlesk : gloss of word sense : noun appearing in
Apply Gloss-based similarity to McCarthy et al.’s approach chicken sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. = prevalence(sense2) = 0.0623 + 0.0414 + ... + 0.0245 = 0.2387
Table of contents • Task • Related work: McCarthy et al (2004) • Gloss-based semantic similarity metrics • Experiments • WSD on the two datasets: EDR and Japanese Senseval2 task • Conclusion and future directions
Experiment 1: EDR • Dataset: EDR corpus • 3,836 polysemous nouns (183,502 instances) • Adopt the similarity score proposed by Lin (1998) as the distributional similarity score • 9-years Mainichi newspaper articles and 10-years Nikkei newspaper articles • Japanese dependency parser CaboCha (Kudo and Matsumoto, 2002) • Use 50 nearest neighbors in line with McCarthy et al. (2004)
Methods • Baseline • Select one word sense at random for each word token and average the precision over 100 trials • Unsupervised: McCarthy et al. (2004) • Semantic similarity: Jiang and Conrath (1997) (jcn), lesk, DSlesk • Supervised (Majority) • Use hand-labeled training data for obtaining the predominant sense of the test words
Results: EDR • DSlesk is comparable to jcn without the requirement for semantic relations such as hyponymy
Results: EDR (Cont’d) • All methods for finding a predominant sense outperform the supervised one for item with little data (≤ 5), indicating that these methods robustly work even for low frequency data where hand-tagged data is unreliable
Experiment 2 and Results: Senseval2 in Japanese • 50 nouns (5,000 instances) • 4 methods • lesk, DSlesk, baseline, supervised precision = recall sense-id: 105-0-0-2-0 fine-grained coarse-grained
Conclusion • We examined different measures of semantic similarity for finding a first sense heuristic for WSD automatically in Japanese • We defined a new gloss-based similarity (DSlesk) and evaluated the performance on two Japanese WSD datasets (EDR and Senseval2), outperforming lesk and achieving a performance comparable to the jcn method which relies on hyponym links which are not always available
Future directions • Explore other information in the glosses, such as words of other POS and predicate-argument relations • Group fine-grained word senses into clusters, making the task suitable for NLP applications (Ide and Wilks, 2006) • Use the results of predominant sense acquisition as a prior knowledge of other approaches • Graph-based approaches (Mihalcea 2005, Nastase 2008)