1 / 15

SenDiS

SenDiS. Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future". General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS -. Project co-financed by the European Regional Development Fund.

jolene
Download Presentation

SenDiS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness""Investments for your future" General Word Sense Disambiguation System applied to Romanian and English Languages- SenDiS - Project co-financed by the European Regional Development Fund SenDiS – WSD model, components, algorithms, methods & results Andrei Mincă - aminca@softwin.ro

  2. SenDiS WSD model

  3. SenDiS System components

  4. SenDiS Order Lexicon Network (OLN) Build Meaning Semantic Signatures (BMSS) Compare Meaning Semantic Signatures (CMSS) Compute WSD Variants (CwsdV) WSD phases

  5. SenDiS Input: unordered lexicon network lexicon network optimizations considering number of edges loops or strong connected components number of roots and leafs number of levels (in the case of leveling the LN) Output: ordered lexicon network OLN Algorithms

  6. SenDiS Input a lexicon network (not necessarily ordered) a meaning ( ID ) Builds a semantic interpretation for the specified meaning over the lexicon network spanning trees sets of nodes sequences of edges or combinations of the above Output : a semantic interpretation (signature) for the meaning BMSS Algorithms

  7. SenDiS Input: two or more semantic signatures comparison depends on the nature of the semantic signatures Output: degrees of similarity CMSS Algorithms

  8. SenDiS Input : a matrix with degrees of similarity between the context words sense Output : oneor severalWSD variants with the highest cost CwsdV Algorithms

  9. SenDiS Input text list of meanings lexicon network Computing tokenization of text annotation of text tokens with meaning interpretations selecting a window-text for WSD other context filters or topologies build meaning semantic signatures for each word-sense compare meaning semantic signatures and fill the matrix compute best WSD variants Output one or more WSD variants with one or more meaning interpretations for each text token WSD methods

  10. SenDiS tokenization part-of-speech tagging lemmatization sense interpretations chunking parsing general WSD requirements

  11. SenDiS Performance indicators P - precision P= noCorrectlyDisambiguated_TargetWords/ noDisambiguated_TargetWords R - recall R= noCorrectlyDisambiguated_TargetWords / noTargetWords F-measure 2 * P * R / (P+R) state-of-the-art results (F-measure) lexical sample task coarse-grained : ~ 90% fine-grained : ~ 73% All-words task coarse-grained : ~83% fine-grained : ~ 65% Testing WSD

  12. SenDiS A test configuration for SenDiS consists of: a meaning inventory a lexicon network an OLN algorithm a BMSS algorithm a CMSS algorithm a CwsdV algorithm a WSD method a Corpus test Testing SenDiS nMIs x nLNsxnOLNsxnBMSSs x nCMSSsx nCwsdVsx nWSDMs x nCorpusTests

  13. SenDiS Results

  14. SenDiS Tagged glosses as a Test Corpus

  15. SenDiS

More Related