180 likes | 321 Views
Related terms search based on WordNet / Wiktionary and its application in ontology matching. RCDL'2009. St. Petersburg Institute for Informatics and Automation of RAS. J ö nk ö ping University, Sweden. Feiyu Lin, A. Krizhanovsky (andrew.krizhanovsky at gmail.com). Contents.
E N D
Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation of RAS Jönköping University, Sweden Feiyu Lin, A.Krizhanovsky (andrew.krizhanovsky at gmail.com)
Contents • Wiki and Wiktionary intro • MRD, parser and Wiktionaries comparison • Correlation of relatedness measures • Experiment scheme • Result and comparison • Results, applications and future
Goal • Is it possible to find related terms by the current version of Wiktionary as successfully as by WordNet? • for ontology matching, • for application in text search systems, • etc. • What advantages?
Wiki-resources Distributed users and authors (edit pages) Centralized storage (e.g. MySQL, Apache, PHP) Set of hyper linked articles Each article has one or more categories (tree) * Example: http://en.wikipedia.org
Wiktionary is a free-content multilingual dictionary
Wiktionary data: +, -, simplicity & complexity Different wiktionaries have different levels of standartization. Fast growing data, but it’s created by a huge community(a developed parser should be very stable) Rich data thesaurus(synonyms, antonyms ) phrase books etymologies pronunciations sample quotations translations Fast growing data Interwiki (add. data) GNU DFL
Wiktionary machine- readable dictionary database scheme
Size of Wiktionaries WordNet (2006): 150,000 words, 115,000 synsets
Correlation of relatedness measures Correlation with human judgments of relatedness measures 353-TC to measures based on WordNet, English Wikipedia, Russian Wiktionary
Application of Machine-readable dictionary (MRD) Thesaurus data: • Related Terms Search • Search request extension (by synonyms) / request reformulation (in search systems) • Request recognition in question-answering systems • Word sense disambiguation Media data (audio + pictures) • Language learning
Work plan: done and todo Russian Wiktionary Extraction (by RE) Definition Relations (synonyms…) Translation Audio Graphics Database API Visualization (MRD browser) Quiz & tests(test application) Russian Wiktionary Database scheme Definition Relations (synonyms…) Translation Audio Graphics Database API English Wiktionary
Implementation Software based on Synarcher code Java MySQL or SQLite database JUnit test framework
Results The scheme of the experiment for calculating the semantic relatedness measure based on Russian Wiktionary data The parser of Russian Wiktionary Database scheme designed Database API implemented in Java Compared the results of related terms search based on Wiktionary and WordNet Project site (Wiki tool kit) http://code.google.com/p/wikokit/
Future work • Finish creation MRD • Database and software • Russian Wiktionary and English Wiktionary • Visualization (JavaFX) • MRD browser • Quiz & tests (learning application) • Online application (Java Web-start) • asdf