1 / 22

Dynamic Building of Domain Specific Lexicons Using Emergent Semantics

Dynamic Building of Domain Specific Lexicons Using Emergent Semantics. Final Presentation. Matt Selway 100079967 Supervisor: Professor Markus Stumptner Knowledge and Software Engineering Laboratory School of Computer and Information Science. Contents. Motivations and Goals

Download Presentation

Dynamic Building of Domain Specific Lexicons Using Emergent Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Building of Domain Specific Lexicons Using Emergent Semantics Final Presentation Matt Selway 100079967 Supervisor: Professor Markus Stumptner Knowledge and Software Engineering Laboratory School of Computer and Information Science

  2. Contents • Motivations and Goals • Research Questions • Method • Experiments and Results • Summary and Conclusions • Limitations and Future Work

  3. Motivations and Goals • Kleiner et al. (2009) developed a very different approach to Natural Language Processing (NLP) • Treat NLP as Model Transformation problem • Utilise Configuration as a model transformation • Model transformation is process of taking input models and creating output models from them • Foundation of Model Driven Engineering • Configuration is a constraint based searching technique • In this case the constraints are conformance to the desired meta model

  4. Motivations and Goals • Overview of Process (Kleiner et al. 2009) • Method shows promising results • However, requires use of predefined lexicon

  5. Motivations and Goals Issues for practical applications: • Can take a long time to manually build a complete lexicon, even for a Specific Domain • Predefined lexicon is static • Reduces level of automation

  6. Motivations and Goals Short-range Goals: • At least partially automated creation of domain specific lexicons directly from the input text and external resources to retrieve lexical data • Make updates a natural part of the system • Allow sharing/reuse of lexical information Long-range Goals: • Improve the automated analysis of specifications • Support research into semantic interoperability • Develop global agreement on lexicons/ontologies

  7. Research Questions • Can we reduce or eliminate the need to manually predefine a lexicon by dynamically building a lexicon based on the input text? • How much of a reduction can be gained? • How well does it work? (i.e. accuracy of retrieved data, how much data is automatically retrieved) • What are its limitations?

  8. Method • Developed an experimental system • Attempted to use emergent semantics and semiotic dynamics in a similar way to that described by Steels and Hanappe (2006) for the interoperability of collective information systems. • They propose a multi-agent system that uses communication to arrive at an agreement on the meaning of the data, its tags, and its categories. • They take advantage of the semiotic triad between data, tags, and categories in user taxonomies (e.g. Bookmarks in a web browser) • Semiotic triad implies a meaningful relationship between its three components

  9. Method Basic semiotic triad (Steels & Hanappe, 2006) • Similarly there exists a semiotic triad between a word, its use, and the domain it is used in. • Idea is that this triad can be used in dynamically developing domain specific lexicons between information agents.

  10. Method (Design) • Multi-agent System • Lexical information retrieved from other agents • Initial data downloaded from online sources • User feedback adjusts the retrieved data • Agents update their lexicons and associations to lexicons based on user feedback (using semiotic relationship) • Lots of changes indicates the agents are actually using different domains • Few changes indicates updates to the lexicon in the same domain

  11. Method (Online Sources) • Surveyed online lexicons/ontologies (CYC, WordNet, EDR) and dictionaries (Oxford, ‘The Free Dictionary’, ‘Your Dictionary’) • Excluded CYC, WordNet, EDR as not suitable • Turned to standard online dictionaries • Official dictionaries Oxford/Harvard not suitable (want money for access) • Discovered the ‘The Free Dictionary’ • Large number of entries • Enough detail in definitions (Transitive/Intransitive Verbs, Definite/Indefinite Articles, etc.) • Reasonably standard pages for parsing

  12. Method (Lexicon)

  13. Method (Agent Communication)

  14. Method (Agent Communication)

  15. Method (Agent Communication)

  16. Experiments and Results

  17. Experiments and Results

  18. Experiments and Results

  19. Experiments and Results

  20. Summary and Conclusions • It works! • How well? • High percentage of words had data retrieved, however, too much unnecessary data reduces the effectiveness • Accuracy is impacted by many factors • Incomplete/incorrect parsing of the web page • Small SBVR specification sample • SBVR keywords • Believe it is worth pursuing and improving • Fix parsing, use multiple sources • Define keyword lexicons, dynamically generate rest • Fill in gaps/cull using words with only one category • Etc.

  21. Limitations and Future Work • Choice of dictionary • Potentially use multiple data sources • Joint words, i.e. most SBVR key words • Implementation not perfect • Parsing of the data source • No synonyms • Communication Protocol • Errors in adjusting association strengths • Strength adjustment values and threshold values used for lexicon classifiers need more research to find more appropriate values • Etc.

  22. References • Kleiner, M, Albert, P & Bézivin, J 2009, ‘Configuring Models for (Controlled) Languages’, in Proceedings of the IJCAI–09 Workshop on Configuration (ConfWS–09), Pacadena, CA, USA, pp. 61-68. • Farlex 2010, The Free Dictionary, viewed 11 September 2010, <www.thefreedictionary.com>. • Steels, L & Hanappe, P 2006, ‘Interoperability Through Emergent Semantics A Semiotic Dynamics Approach’, in Journal on Data Semantics VI, vol. 4090, Springer Berlin / Heidelberg, pp. 143-167.

More Related