1 / 36

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS

This research paper addresses the problem of finding the appropriate word(s) whose meaning matches a given definition. It proposes a Meaning-to-Word (MTW) system that uses Turkish Monolingual Dictionary and Turkish WordNet as resources for word retrieval. The system employs techniques such as tokenization, stemming, stop word elimination, stem matching, and query expansion to improve the accuracy of word retrieval. The results show that the MTW system outperforms traditional dictionary-based methods in finding suitable words for a given definition.

jbey
Download Presentation

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

  2. Problem • For a given definition, find the appropriate word (or words) • Traditional dictionary is of no use • From a dictionary, find an appropriate word that has a “similar” definition

  3. Examples ? • User definition: Akımı ölçmek için kullanılan alet (A device that is used to measure the currenta) • In the dictionary: akımölçer: elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (ammeter: a device that measures the intensity of electrical current, amperemeter)

  4. Applications • Computer-assisted language learning • Solving crossword puzzles • Reverse dictionary

  5. Outline • Problem statement • Meaning-to-Word System (MTW) • Our Approach • Methods • Results • Result Summary • Conclusion

  6. Problem Statement • Find the “similarity” between two definitions Akımı ölçmek için kullanılan alet (A device that is used to measure the current) Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current,amperemeter)

  7. Meaning-to-Word (MTW) • addresses the problem of finding the appropriate word (or words), whose meaning “matches” the given definition • Two subproblems • finding words whose definitions are "similar" to the query in some sense • ranking the candidate words using a variety of ways

  8. Information Flow in MTW User Definition query Search in Dictionary candidates Rank Candidates List of words

  9. Available Resources • Turkish Monolingual Dictionary • About 50.000 entries • Turkish WordNet • About 11.000 synsets

  10. Normalization User Definition Normalization query Search in Dictionary candidates Rank Candidates List of words

  11. Normalization • Tokenization • Stemming • Stop Word Elimination

  12. Query Processing User Definition query Query Processing Search in Dictionary candidates Rank Candidates List of words

  13. Query Processing • Subset Generation • Search with different set of words • Select informative words from user’s query Query: dahaönce hiçevlenmemiş kişi(a person who has never been married) {önce, evlen, kişi}(before, marry, person) {evlen, kişi}, {önce, kişi}, {önce, evlen} (marry, person)(before, person) (before, marry) {evlen}, {önce}, {kişi} (marry) (before) (person)

  14. Query Processing • SubsetSorting • Unordered list of subsets are insufficient • Rank the generated subsets 1) By the number of words {önce,evlen, kişi} (before, marry, person) {evlen, kişi}(marry, person) 2) By the sum of frequency logarithm {evlen, kişi} (marry, person) {önce, kişi} (before, person)

  15. Searching for Meanings User Definition query Search in Dictionary candidates Rank Candidates List of words

  16. Searching for Meanings • Two methods • Stem Matching • Query Expansion (using WordNet)

  17. Stem Matching • Morphological normalization of words • Find meanings that contain morphological variants of the original definition

  18. Stem Matching (Ex.) (A device that is used to measure the current) { akımı ölçmek için kullanılan alet } ak (white)ölç(measure)için(to)kullan(use)alet(device) akım(current)iç(drink) kul (slave) akı (flux) Colored stems are the matching ones

  19. Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)

  20. Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)

  21. Stem Matching • Drawbacks • Generate noisy stems ilim (science, my city)ilim (science), il (city) • Conflate two words with very different meanings to the same stem ilim (science, my city), ilde (in the city)  il (city) • Cannot find relations between similar words kimse (someone) kişi (person) bölüm (part) kısım (portion)

  22. Using Query Expansion • Two different approaches: • Expand query with relations (synonyms, specializations, generalizations) • Expand query with unexpanded query’s relevant answers • WordNet synonyms are used in MTW {besin,gıda} (food, nourishment) {iyileş,düzel} (to get better) /{iyileş,geliş} (to improve)

  23. Query Expansion (Ex.) (A device that is used to measure the current) { akımı ölçmek için kullanılan alet } ak (white)ölç(measure)için(to)kullan(use)alet(device) akım(current)iç(drink) kul (slave) akı (flux) beyaz faydalan araç debiyararlan gereç akış köle

  24. Query Expansion (Ex.) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)

  25. Query Expansion (Ex.) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)

  26. Ranking User Definition query Search in Dictionary candidates Rank Candidates List of words

  27. Ranking • Very important part of MTW • Having the right answer in the retrieved set is not enough • Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)

  28. Ranking • Simple but effective methods • Number of matched words • Subset informativeness - frequency of words in the subset • Ratio of number of matched words to the number of words in the candidate dictionary definition • Longest Common Subsequence - order of the matched words

  29. Some Statistics • Training sets: • 50 queries from users • 50 queries from a dictionary • Test sets: • 50 queries from users • 50 queries from a separate dictionary

  30. Stem Matching all stems included Low % in top 10 in user queries but very high results in dictionary queries

  31. Stem Matching longest stem included (heuristics) Improvement in user queries, slightly better performance in dictionary queries

  32. Query Expansion (WordNet) all stems included Better results in user queries, no change in dictionary queries

  33. Query Expansion (WordNet) longest stem included (heuristics) Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries

  34. Result Summary • Stem Matching (longest stem included) • 60% success in real user queries • 96% success in dictionary queries • Query Expansion (all stems included) • 68% success in real user queries • 92% success in dictionary queries

  35. Conclusion • We have implemented a ‘Meaning to Word’ system for Turkish • Results on unseen data are rather satisfactory • Query expansion is better • Although, it cannot find the words for all queries • 68% of real user queries and 90% of dictionary queries are found in the first 50 results

  36. THANK YOU !

More Related