250 likes | 441 Views
Building and Ordering a SenDiS Lexicon Network. Oana Adriana Şoica. SenDiS operates on a specific lexicon network ( LexNet ) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet :
E N D
Building and Ordering a SenDiS Lexicon Network Oana Adriana Şoica
SenDiS operates on a specific lexicon network (LexNet) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet(manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet (WordNet tagged glosses, as of 2008) preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet Outline
hypernyms hyponyms similar to has part Semantic/Lexical Relations • synonyms • antonyms • holonyms • meronyms • coordinate terms • troponyms • entailment
Semantic/Lexical relations: WordNet An excerpt of the WordNet semantic network * Navigli, R. 2009.Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009)
manually annotating the glosses from a lexicon(using a specific tool that can ease the process) importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses Obtaining a SenDiS LexNet
implied a significant effort, usually measured in months, involving several trained linguists using a specialized collaborative tool(BuildLNTool – Build Lexicon Network Tool) enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language Creating the SenDiS LexNet
BuildLNTool (Build Lexicon Network Tool) provides: a visual and effective mechanism to manually annotate the lexicon glosses a synchronized overview of the already created relations a browsing mechanism for inspecting the already tagged glosses and relations BuildLNTool
BuildLNTool - Sections “Lemma \ MWE Info” “Lemmas & MWEs” “Competence & Definition Trees” Messages and progress “Root & Leaf Meanings”
“Lemmas & MWEs”: list of lexicon entries “Root & Leaf Meanings”: list of roots and leafs for the lexicon network “Lemma/MWE Info”: current lexicon entry being analyzed “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net section formessagesand progress BuildLNTool – Sections II
BuildLNTool – Lemmas & MWEs selection of lexicon entry type selection of viewing interval selection of unfinished lexicon entries filter text filter lexicon entry text lexicon entry status
BuildLNTool – Selection of a current lexicon entry double click
BuildLNTool – Browsing the meanings of the current lexicon entry lexicon entry text morphologic interpretation list of meanings filters meaning/gloss fully tagged meaning/gloss partially tagged meaning/gloss not tagged
BuildLNTool – Selection of a current meaning for tagging double click
BuildLNTool – Gloss constituent without interpretations unrecognizedgloss constituent ‘Enter’
BuildLNTool – Degrees of relevance (in gloss context) Default setting: Medium
BuildLNTool – Degrees of relevance II ‘Strong’ tokens ‘Medium’ tokens ‘Weak’ tokens Ignored (X)tokens
BuildLNTool – Gloss tagging Savedannotations Unsavedannotations
BuildLNTool – Gloss tagging protocol viewof meaning tagging tree selection of constituent / group of gloss constituents edit text of gloss constituent set/ modify relevance degree withoutsense interpretations current gloss constituent select/ modify the sense forthe gloss constituent further annotatemeaning / save annotations further on chose the next meaning save annotations
WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples) the synsets were split and transformed in to a classical lexicon format the lexicon network imported: Imported WordNet tagged glosses
“gloss tagged” lexicon nets are large and dense graphs between 100,000 and 200.000 vertices over 1,000,000 edges / arcs to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices Ordering a SenDiS LexNet
e9 e4 e5 e6 e7 e8 e1 e2 e3 Unordered LexNet A minimal lexicon net in the original form
V e9 11 e5 10 e4 9 e2 8 e1 7 6 e3 e6 5 e7 4 e8 3 e10 2 1 e11 B Ordered (leveled) LexNet The same minimal lexicon net leveled