160 likes | 269 Views
Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary. Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-Nyoung Seon , Songwook Lee, Jungynu Seo. 國立雲林科技大學 National Yunlin University of Science and Technology. IPM 2007.
E N D
Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-NyoungSeon, Songwook Lee, JungynuSeo 國立雲林科技大學 National Yunlin University of Science and Technology IPM 2007
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation • The Word Sense Disambiguation is a common problem in natural language processing. • Traditional approaches only consider the co-occurrence probability alone. Sample:I deposit some money in the bank. Options: bank = 銀行? bank = 堤; 岸? bank = (一)排; (一)組
Objective • To construct a WSD system, which can be easily implemented by learning all polysemous words at once, while covering all polysemous words which are listed in MRD. • To consider relation between each sense of context words and the sense of the target word. Sample:I deposit some money In the bank. Ans: bank = 銀行
Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph
Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD
Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD.
Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD
Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph
Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph
Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph
Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph
Experiments • System results
Experiments • Experiment on English • The accuracy of the system is 30.7% on average. • The result is very low; there are some reasons as follows. • Context words are not appropriate although context words are very important in that they decide which sense of the target word might be the best. • Mapping English senses to Korean for using English-Korean dictionary leads to some loss of information. • The errors of the stemming process disturbed us to search the right root of the verb in the MRD.
Conclusion • To consider the relationship between each sense of context words and the sense of the target word • By using Viterbi algorithm to reduce computational complexity. • The system showed bad results on English (30.7), but it resulted in suitable performances, 76.4% by accuracy, over the semantically ambiguous Korean words. • To apply this method to other languages by studying language characteristics. 15
Comments • Advantage • To consider the relationship between each sense of context words and the sense of the target word. • By using Viterbi algorithm to reduce computational complexity. • Drawback • The performance of this system is better in Korean. • Application • Word Sense Disambiguation 16