1 / 24

LIRICS WP2 – NLP Lexica

LIRICS WP2 – NLP Lexica. Monica Monachini monica.monachini@ilc.cnr.it CNR-ILC - Pisa 23rd May 2006. Summary of the presentation. Overview of WP2 1° year objectives Main results in T2.1 and T2.2 Work done Synergies with other LIRICS WPs, ISO activities, meetings

Download Presentation

LIRICS WP2 – NLP Lexica

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIRICS WP2 – NLP Lexica Monica Monachini monica.monachini@ilc.cnr.it CNR-ILC - Pisa 23rd May 2006

  2. Summary of the presentation • Overview of WP2 • 1° year objectives • Main results in T2.1 and T2.2 • Work done • Synergies with other LIRICS WPs, ISO activities, meetings • Priorities for future activities

  3. WP2 overall objective Define a “family” of standards for NLP lexicons Two-level standards: • the high level specifications provide structural elements, i.e. lexical classes and relations between them, the meta-model; • the low level specifications provide standardized constants, i.e. data categories used to “adorn” the lexical classes  ISO 12620

  4. From past and on-going standardization activities, gathering linguistic information considered relevant for lexical description and to be combined with the layers of the lexical model • Coherent input to ISO Data Category Registry revision WP 2 T2.1 overview and objectives

  5. WP 2 T2.1 results • Proposal for a unified set of lexical information and unified descriptors as draft set of Data Categories • Maximum set of candidate lexical data categories subdivided along the layers of linguistic description: morphosyntax, syntax and semantics. • Data Categories shared between WP2 and WP3 relevant to Morphosyntactic description have been incorporated in the Syntax Tool: the Morphosyntactic Profile.

  6. WP2 T2.1 Deliverables D.2.1 Survey and evaluation of existing standard for Lexica D.2.1 Survey and evaluation of existing standard for Lexica (revision) (version foreseen in conjunction with Data Cats to be issued togetherwith the data model in T2.2) D.2.1 Survey and evaluation of existing standard for Lexica

  7. WP2 T2.2 overview and objectives • Define a lexical framework, a general and abstract meta-model as a set of structural nodes relevant for lexical description, enabling specific implementations on the basis of common Data Categories • Definition of the common set of related Data Categories

  8. WP2 T2.2 results • Formulation of a high-level lexical meta-model, the Lexical Markup Framework, a flexible environment for user-defined mark-up languages • Proof-of-concepts: mapping exercises of well known NLP lexicon practices against the model

  9. WP2 T2.2 Deliverables NLP Lexica standard for CD ballot (submitted beginning year 06) NLP Lexica standard for ISO DIS ballot Internal milestone for internal quality control

  10. WP2 Activities, Meetings, Synergies... LIRICS WPs BI- TRI-LATERAL Working Meetings: • CNR-ILC – MPI, 15.2.2005: PAROLE-SIMPLE lexical architecture and LEXUS tool • WP2 internal meeting, 16.2.2005: basic structure of the meta-model for lexicons (core model + extensions) • CNR-ILC – DFKI, 5.5.2005: convergences between morpho-syntactic and syntactic data; issues for the submission of the N W I on Syntax (SynAF) to ISO • Pisa, 23-24.11.2005. WP2 internal meeting: basic structure of the meta-model for representation of Multiword expressions LIRICS Meetings • Paris, 16-17.3.2005. Progress of work within WP2. Presentation of the standard core model for lexicons and the extensions for NLP lexicons • Barcelona, 21-22.6.2005. LIRICS Industrial Advisory Board Meeting • Barcelona, 22.6.2005 Presentation of first bulk of information relevant for lexical description • Nancy, 8-9.12.2005. WP4 TDG3 Workshop: connections between lexico-semantic representation and semantic roles in lexicon ISO Meetings • Berlin 8-9.4.2005. ISO TC37/SC4 WG4 Meetings • Warsaw 21-26.08.05. Plenary meeting of ISO TC37/SC4. Task force for the purpose of designating generic data category sets for alignment with with the level of the metamodel; task force related to the representation of MWEs. • Rome 27.10.2005. UNI-DIAM Commission: candidature of Italy as P-member in ISO TC37/SC4 (CNR-ILC reference expert)

  11. What is LMF for? • provide a common model for the creation and use of lexical resources • manage the exchange of data between and among these resources • enable the merging of electronic resources to form extensive global resources. • Range of topics: • monolingual, • bilingual • multilingual lexical resources • Scalability • the same specifications are to be used for both small and large lexicons • Coverage • linguistic description range from morphology, syntax, semantic to multilingual representation • languages are not restricted to European languages • the range of targeted NLP applications is not restricted.

  12. Future activities/Priorities/Plans • Data Categories • deliver rev 2 of D2.1: candidate data categories will receive the necessary adjustments after discussion • extend the ISO Registry to cover further layers of linguistic description: do we need an ISO Syntactic Profile (Bejin)? • LMF model • refine the NLP multilingual and MWE extensions • XML representation of LMF linguistic objects in order to allow unified access to LMF conformant lexicons through APIs • Provide implementation of test suite lexical entries: PAROLE-SIMPLE lexicons ready to be described according to LMF (LEXUS), to be put in the LMF server and made accessible via the web.

  13. Structure of LMF Structural skeleton, with the basic hierarchy of information in a lexical entry extend a subset of core-model classes; are conformant to the core model; cannot be used regardless to the core model LMF specifications comply with modeling UML principles

  14. Core package Container for managing the top level language components. The number of words or MWe of the lexicon is equal to the number of lexical entries in a given lexicon. It is a cross-reference pivot that can link to many Lexical Entries within or across Lexicons. Form consists of a text string that represents a single word or a multi-word expression One to many Representation Frames can be associated with Form, each of which contains a form and data categories that specify the orthographic types and name of the word Sense specifies or disambiguates the meaning and context of a form

  15. Package for extensional morphology 1st strategy:describe the morphologyrepresenting explicitly all inflections

  16. Package for inflectional paradigm 2nd strategy: declare an inflectional paradigm; use the inflectional paradigm extension for defining it

  17. Package for NLP syntax Syntactic behavior represents one of the behaviors of one (or more) senses Construction describes one syntactic construction and can be shared by all words with the same syntactic behavior Self refers to the head lexical entry and describes syntactic properties Syntactic Argument describes a syntactic actant ConstructionSet regroups together various Syntactic Constructions and factorizes syntactic descriptions to have a minimum of syntactic behavior elements in the lexicon.

  18. XML representation

  19. Package for NLP semantics Predicative Representation describes the link between Sense and Semantic Predicate Semantic Predicate describes an abstract meaning Semantic Argument describes a semantic actant and is linked with its syntactic counterpart

  20. Package for NLP semantics (cont.)

  21. XML representation

  22. Package for NLP semantics (cont.)

  23. Package for Multilingual representation Sense Axis Relation describes the linking between two different Sense Axis Source and TargetTest permit to express conditions about the translation on the source/target language side

  24. Package for Multiword expressions

More Related