190 likes | 360 Views
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool). J á n GEN Č I Technical University of Ko šice , Slovakia genci@tuke.sk. Plan. WordNet, EuroWordNet + Slovak language Motivation Solution Results Future plans. WordNet, EuroWordNet. Well known projects
E N D
Use of WordNet and on-line dictionaries to build EN-SK synsets(experimental tool) Ján GENČI Technical University of Košice, Slovakia genci@tuke.sk
Plan • WordNet, EuroWordNet + Slovak language • Motivation • Solution • Results • Future plans
WordNet, EuroWordNet • Well known projects • WordNet defines meaning of English words and their relationships (it defines synsets) • EuroWordNet (EWN) is very similar multilingual project • EWN doesn’t contain Slovak language (Slovak WN)
Motivation • Text classification tasks require reduction of dimensionality and Intelligent search • Morphological database • Something like WordNet
Our approach • We decided to try to use on-line dictionaries to map Slovak meanings to Wordnet synset entries • Two approaches: • Intersection of translation of each member of EN synset • Intersection of translation of related words
WordNet DB local DB Architecture Synset Builder Input word Inet online dict.
Synset “members” translation • According WN word computer has 2 meanings specified by 2 synsets • {computer, computing machine,computing device, data processor,electronic computer, information, processing system} • {calculator, reckoner, figurer, estimator, computer} • Result is formed as intersection of translation of synset members
Translation of related words • Based on hyponym/hyperonym relationship between words: • Related words are translated • Result is formed as intersection of partial translations
Results • We provide 4 Slovak and 2 Czech on-line dictionaries (Slovak dictionaries seem to be from one source) • Result depends on: • Number of members in the synset (1 is problem) • Related words • Quality(?) of dictionary
Results (cont.) • Parts of speech are sometimes mixed (nouns and adjectives) • We implemented “multilingual view” • Time consuming approach (quite slow) – results are stored to the database
Examples word computer
Example word table
Future works (plans) • To deal with “dictionary problem” • To eliminate mixed parts of speech in the results (at least for Slovak language, using morphological database) • To connect other languages
Local copy of new webpage • Addresses • http://ruzin.fei.tuke.sk/~laposp • http://ruzin.fei.tuke.sk/~sudynova (new one)