180 likes | 193 Views
This paper presents the results and evaluation of the Hungarian Nominal WordNet project. It discusses the automatic methods used, the evaluation process, the combination of results, and future work. The project aims to expand the Hungarian WordNet by creating a nominal database. The evaluation shows the precision and coverage of the different methods used in creating the WordNet. Further work includes increasing precision and coverage, adding multiwords and derivational links, and upgrading to WordNet 2.0.
E N D
Results and Evaluation of HungarianNominal WordNet v1.0 Márton Miháltz MorphoLogic The 2nd Global WordNet Conference, 2004, Brno
Outline 1. About theHungarian WordNet project 2. Automatic methods 3. Evaluation 4. Combination of results 5. Future work
Hungarian WN project 1. • Started in 2001 • MA Thesis; MorhoLogic project • 1st(current) phase: nominal database • Minimizing costs: • Expand method
Hungarian WN project 1. Hypernym Meronym Antonym ember, személy, individum {human, person} hozzátartozó, rokon test {parent} {body} {mother} {father} {arm} {leg} anya apa kar láb
Hungarian WN project 2. • Started in 2000 • MA Thesis; MorhoLogic project • 1stphase: nominal database • Minimizing costs: • Expand method • Semantic Similarity Hypothesis • Automatic methods
Hungarian WN project3. • Ambiguity problem: {horse, Equus caballus} horse {horse} (gymnastic apparatus) ló knight {knight, horse} (chess figure) {knight} (person of noble origin) (avg. 1.71) (avg. 2.16) • 9 disambiguation heuristics • (Atserias et al, 1997)
Hungarian WN Project 4. • Electronic resources: • Princeton WN 1.6 • Hungarian-English bilingual dictionary • 17,000—12,400 headwords (WN) • Monolingual (Hungarian) explanatory dictionary • 42,000 nominalentries • 64,000 definitions
Disambiguation Heuristics 1. A) Heuristics based on bilingual dictionary: • Monosemous translation: Hu En1 {ss1} … Hu En1 {ss1} • Variant English words: En2 … • Intersection method: Hu1 En1 {ss1} Hu2 En2 … …
Disambiguation Heuristics 2. A) Heuristics based on bilingual (cont’d): • Identifying derivational hypernyms: • Hungarian endocentric N+N compounds • Humor analyzer: • last segment (head) = hypernym • hangverseny+zongora zongora • (‘concert+piano’ ‘piano’) • Conceptual Distance
Disambiguation Heuristics3. B) Parsing monolingual definitions: • Synonyms: • lélekelemzés_1_1: A tudat alatti lelki jelenségek vizsgálata; pszichoanalízis[psychoanalysis] Hu En1 {ss1} Syn En2 … • Hypernyms: • koala_1_1: Ausztráliában honos, fán élő, medvére emlékeztető erszényes emlős. [mammal] Hyp Eni1 {ss1} … min Hu Enj1 {ss2} • Latin equivalents: • ló_1_1 [horse]: Vontatásra és lovaglásra haszn., páratlan ujjú patás háziállat (Equus Caballus) … Hu En1 {ss1} … Lat
Disambiguation Heuristics 4. C) Methods for increasing coverage (+9.2%): • Derivational hypernym of hyp./syn.: Hu Hyp/Syn DerivHyp Eng1 ( Eng) Eng2 • Lookup of hyp./syn. in monolingual: Monolingual: monosemous? Hu Hyp/Syn Hyp YES ( Eng) Eng1 Eng2
Results & Validation • Results from 9unsupervised heuristics: • Total: 13,948 Hung.Nouns 12,085 PWN synsets (22,169 connections) • Different methods: different confidence! • Validation: • Gold standard: 400 nouns random from biling./Hu • Manual disambiguation (2,201 possible connections) • IAA: 84.7% • Evaluation of 9 result sets against GS • Precision: 49%—92% • Coverage: 49%—0,5%
Combining results 1. • Combinig different result sets: • 2 different confidence thresholds • 1-4. methods: precision 75% (2,445 n, 2,170 ss) • 1-6.methods: precision 63% (12,275 n, 12,004 ss) • Validating and combining results not included in the previous step • 8 of 13 intersection sets: precision75% • 9 intersection sets : precision63%
Combining results 2. • Combination of the 2 base sets & the intersection sets w.r.t. the 2 thresholds
Further Work 1. • Increase precision: • Complete manual checking of words in synsets • Editing of hierarchies • Increasecoverage: • Use additional bilingual dictionaries w/ best auto methods • Use Hung. taxonomies from monolingual dict. • Add multiwords • Add derivational links • Upgrade to WN 2.0
Further Work 2. • Funding from IKTA grant (2004-2007?): • Manual supervision • Connect to EuroWordNet Top Ontology/ILI • Do verbs(adjectives, adverbs) • Add special domain: financial terms
Thank you for your attention!MártonMiháltz http://people.inf.elte.hu/mmarcy/huwn/