1 / 24

Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition

IJCNLP2008 Jan 10, 2008. Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition. Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling University of Sussex. Word Sense Disambiguation. Predominant sense acquisition

erin-love
Download Presentation

Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IJCNLP2008 Jan 10, 2008 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling University of Sussex

  2. Word Sense Disambiguation • Predominant sense acquisition • Exploited as a powerful back-off strategy for word sense disambiguation • McCarthy et al (2004): • Achieved 64% precision on Senseval2 all-words task • Strongly relies on linguistic resources such as WordNet for calculating the semantic similarity  Difficulty: porting it to other languages

  3. Focus • How to calculate the semantic similarity score without semantic relations such as hyponym • Explore the potential use of the word definitions (glosses) instead of WordNet-style resources for porting McCarthy et al.’s method to other languages

  4. Table of contents • Task • Related work: McCarthy et al (2004) • Gloss-based semantic similarity metrics • Experiments • WSD on the two datasets: EDR and Japanese Senseval2 task • Conclusion and future directions

  5. Word Sense Disambiguation (WSD) task • select the correct sense of the word appearing in the context I ate fried chicken last Sunday. • Supervised approaches have been mainly applied to learn the context

  6. Word Sense Disambiguation (WSD) task (Cont’d) • Estimate the most predominant sense of a word regardless of its context • English coarse-grained all words task (2007) • Choosing most frequent senses: 78.9% • Best performing system: 82.5% • Systems using a first sense heuristic have relied on sense-tagged data • However, sense-tagged data is expensive

  7. McCarthy et al. (2004)’s unsupervised approach • Extract top N neighbour words of the target word according to the distributional similarity score (simds) • Calculate the prevalent score of each sense • Calculate simds weighted by the semantic similarity score (simss) • Sum up all the weighted simdsof top N neighbours • Semantic similarity: estimated from linguistic resources (e.g. WordNet) • Output the sense which has the maximum prevalent score

  8. simss(word, sense2) weighted simds 0.15 0.0271 0.20 0.0365 = ... ... 0.10 0.0157 semantic similarity score (from WordNet) McCarthy et al. (2004)’s approach: An example chicken sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. distributional similarity score prevalence(sense2) = 0.0271 + 0.0365 + ... + 0.0157 = 0.152

  9. McCarthy et al. (2004)’s approach: An example chicken sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. = prevalence(sense2) = 0.152 prevalence(sense3) = 0.0018 + 0.0037 + ... + 0.0016 = 0.023 prevalence(sense2) > prevalence(sense3)  predominant sense: sense2

  10. Problem • While the McCarthy et al.’s method works well for English, other inventories do no always have WordNet-style resources to tie the nearest neighbors to the sense inventory • While traditional dictionaries do not organise senses into synsets, they do typically have sense definitions (glosses) associated with the senses

  11. Gloss-basedsimilarity • Calculate similarity between two glosses in a dictionary as semantic similarity • simlesk: simply calculate the overlap of the content words in the glosses of the two word senses • simDSlesk: use distributional similarity as an approximation of semantic distance between the words in the two glosses

  12. lesk: Example • simlesk(chicken, turkey) = 2 • “meat” and “food” are overlapped in two glosses

  13. lesk: Example • simlesk(chicken, tomato) = 0 • No overlap in two glosses

  14. DSlesk • Calculate distributional similarity scores of any pairs of nouns in two glosses simds(meat, fruit) = 0.1625, simds(meat, vegetable) = 0.1843,simds(bird, fruit) = 0.1001, simds(bird, vegetable) = 0.0717, simds(food, fruit) = 0.1857, simds(food, vegetable) = 0.1772 • Output the average of the maximum distributional similarity of all the nouns in target word simDSlesk (chicken, tomato)= 1/3 (0.1843 + 0.1001 + 0.1857) = 0.1557

  15. DSlesk : gloss of word sense : noun appearing in

  16. Apply Gloss-based similarity to McCarthy et al.’s approach chicken sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. = prevalence(sense2) = 0.0623 + 0.0414 + ... + 0.0245 = 0.2387

  17. Table of contents • Task • Related work: McCarthy et al (2004) • Gloss-based semantic similarity metrics • Experiments • WSD on the two datasets: EDR and Japanese Senseval2 task • Conclusion and future directions

  18. Experiment 1: EDR • Dataset: EDR corpus • 3,836 polysemous nouns (183,502 instances) • Adopt the similarity score proposed by Lin (1998) as the distributional similarity score • 9-years Mainichi newspaper articles and 10-years Nikkei newspaper articles • Japanese dependency parser CaboCha (Kudo and Matsumoto, 2002) • Use 50 nearest neighbors in line with McCarthy et al. (2004)

  19. Methods • Baseline • Select one word sense at random for each word token and average the precision over 100 trials • Unsupervised: McCarthy et al. (2004) • Semantic similarity: Jiang and Conrath (1997) (jcn), lesk, DSlesk • Supervised (Majority) • Use hand-labeled training data for obtaining the predominant sense of the test words

  20. Results: EDR • DSlesk is comparable to jcn without the requirement for semantic relations such as hyponymy

  21. Results: EDR (Cont’d) • All methods for finding a predominant sense outperform the supervised one for item with little data (≤ 5), indicating that these methods robustly work even for low frequency data where hand-tagged data is unreliable

  22. Experiment 2 and Results: Senseval2 in Japanese • 50 nouns (5,000 instances) • 4 methods • lesk, DSlesk, baseline, supervised precision = recall sense-id: 105-0-0-2-0 fine-grained coarse-grained

  23. Conclusion • We examined different measures of semantic similarity for finding a first sense heuristic for WSD automatically in Japanese • We defined a new gloss-based similarity (DSlesk) and evaluated the performance on two Japanese WSD datasets (EDR and Senseval2), outperforming lesk and achieving a performance comparable to the jcn method which relies on hyponym links which are not always available

  24. Future directions • Explore other information in the glosses, such as words of other POS and predicate-argument relations • Group fine-grained word senses into clusters, making the task suitable for NLP applications (Ide and Wilks, 2006) • Use the results of predominant sense acquisition as a prior knowledge of other approaches • Graph-based approaches (Mihalcea 2005, Nastase 2008)

More Related