150 likes | 306 Views
SenDiS. Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future". General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS -. Project co-financed by the European Regional Development Fund.
E N D
SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness""Investments for your future" General Word Sense Disambiguation System applied to Romanian and English Languages- SenDiS - Project co-financed by the European Regional Development Fund SenDiS – WSD model, components, algorithms, methods & results Andrei Mincă - aminca@softwin.ro
SenDiS WSD model
SenDiS System components
SenDiS Order Lexicon Network (OLN) Build Meaning Semantic Signatures (BMSS) Compare Meaning Semantic Signatures (CMSS) Compute WSD Variants (CwsdV) WSD phases
SenDiS Input: unordered lexicon network lexicon network optimizations considering number of edges loops or strong connected components number of roots and leafs number of levels (in the case of leveling the LN) Output: ordered lexicon network OLN Algorithms
SenDiS Input a lexicon network (not necessarily ordered) a meaning ( ID ) Builds a semantic interpretation for the specified meaning over the lexicon network spanning trees sets of nodes sequences of edges or combinations of the above Output : a semantic interpretation (signature) for the meaning BMSS Algorithms
SenDiS Input: two or more semantic signatures comparison depends on the nature of the semantic signatures Output: degrees of similarity CMSS Algorithms
SenDiS Input : a matrix with degrees of similarity between the context words sense Output : oneor severalWSD variants with the highest cost CwsdV Algorithms
SenDiS Input text list of meanings lexicon network Computing tokenization of text annotation of text tokens with meaning interpretations selecting a window-text for WSD other context filters or topologies build meaning semantic signatures for each word-sense compare meaning semantic signatures and fill the matrix compute best WSD variants Output one or more WSD variants with one or more meaning interpretations for each text token WSD methods
SenDiS tokenization part-of-speech tagging lemmatization sense interpretations chunking parsing general WSD requirements
SenDiS Performance indicators P - precision P= noCorrectlyDisambiguated_TargetWords/ noDisambiguated_TargetWords R - recall R= noCorrectlyDisambiguated_TargetWords / noTargetWords F-measure 2 * P * R / (P+R) state-of-the-art results (F-measure) lexical sample task coarse-grained : ~ 90% fine-grained : ~ 73% All-words task coarse-grained : ~83% fine-grained : ~ 65% Testing WSD
SenDiS A test configuration for SenDiS consists of: a meaning inventory a lexicon network an OLN algorithm a BMSS algorithm a CMSS algorithm a CwsdV algorithm a WSD method a Corpus test Testing SenDiS nMIs x nLNsxnOLNsxnBMSSs x nCMSSsx nCwsdVsx nWSDMs x nCorpusTests
SenDiS Results
SenDiS Tagged glosses as a Test Corpus