SenDiS

SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness""Investments for your future" General Word Sense Disambiguation System applied to Romanian and English Languages- SenDiS - Project co-financed by the European Regional Development Fund SenDiS – WSD model, components, algorithms, methods & results Andrei Mincă - aminca@softwin.ro

SenDiS WSD model

SenDiS System components

SenDiS Order Lexicon Network (OLN) Build Meaning Semantic Signatures (BMSS) Compare Meaning Semantic Signatures (CMSS) Compute WSD Variants (CwsdV) WSD phases

SenDiS Input: unordered lexicon network lexicon network optimizations considering number of edges loops or strong connected components number of roots and leafs number of levels (in the case of leveling the LN) Output: ordered lexicon network OLN Algorithms

SenDiS Input a lexicon network (not necessarily ordered) a meaning ( ID ) Builds a semantic interpretation for the specified meaning over the lexicon network spanning trees sets of nodes sequences of edges or combinations of the above Output : a semantic interpretation (signature) for the meaning BMSS Algorithms

SenDiS Input: two or more semantic signatures comparison depends on the nature of the semantic signatures Output: degrees of similarity CMSS Algorithms

SenDiS Input : a matrix with degrees of similarity between the context words sense Output : oneor severalWSD variants with the highest cost CwsdV Algorithms

SenDiS Input text list of meanings lexicon network Computing tokenization of text annotation of text tokens with meaning interpretations selecting a window-text for WSD other context filters or topologies build meaning semantic signatures for each word-sense compare meaning semantic signatures and fill the matrix compute best WSD variants Output one or more WSD variants with one or more meaning interpretations for each text token WSD methods

SenDiS tokenization part-of-speech tagging lemmatization sense interpretations chunking parsing general WSD requirements

SenDiS Performance indicators P - precision P= noCorrectlyDisambiguated_TargetWords/ noDisambiguated_TargetWords R - recall R= noCorrectlyDisambiguated_TargetWords / noTargetWords F-measure 2 * P * R / (P+R) state-of-the-art results (F-measure) lexical sample task coarse-grained : ~ 90% fine-grained : ~ 73% All-words task coarse-grained : ~83% fine-grained : ~ 65% Testing WSD

SenDiS A test configuration for SenDiS consists of: a meaning inventory a lexicon network an OLN algorithm a BMSS algorithm a CMSS algorithm a CwsdV algorithm a WSD method a Corpus test Testing SenDiS nMIs x nLNsxnOLNsxnBMSSs x nCMSSsx nCwsdVsx nWSDMs x nCorpusTests

SenDiS Results

SenDiS Tagged glosses as a Test Corpus

SenDiS

SenDiS

SenDiS

Presentation Transcript

Project co-financed by the European Regional Development Fund