1 / 17

XMELLT

XMELLT. Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation. Nancy Ide Department of Computer Science Vassar College. Participants. Department of Computer Science, Vassar College

Download Presentation

XMELLT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation Nancy Ide Department of Computer Science Vassar College

  2. Participants • Department of Computer Science, Vassar College • International Computer Science Institute, University of California, Berkeley • Department of Computer Science, New York University • Computing Research Laboratory, New Mexico State University XMELLT

  3. Framework • Planning project • one-year time frame • Originally submitted as a joint NSF-EU project with additional European partners • Istituto di Linguistica Computazionale, CNR, Pisa • Institut für Maschinelle Sprachverarbeitung,Stuttgart • LexiQuest, Paris XMELLT

  4. Overall goal • define a core international infrastructure to support the creation of a multi-lingual multi-word expression lexicon incorporating both morpho-syntactic and semantic information XMELLT

  5. Specific aims • determine the type and dimensions of information to serve the needs of critical NLP applications • specify an overall architecture for a joint software and lingware development project XMELLT

  6. Aims... • Explore the possibilities for recognizing and acquiring multi-word lexical units from corpora by means of partial parsing, statistics, etc. • Outline a collaborative project to acquire and represent multi-word lexical entries for multiple languages XMELLT

  7. Motivation • Multi-word constructions are extremely frequent in language • ~30%of the lexical stock • Existing resources do not adequately treat multi-word expressions XMELLT

  8. Limitations • constructed for particular system or application • incorporate tailored information (e.g., primarily syntax with little semantics) • not reusable • most devoted to a single language and/or approach XMELLT

  9. Limitations... • not flexible, expandable to multiple languages • MT systems' lexicons are typically little more than "translation memories" • No interface among single-word entries, multi-word entries, syntax, and semantics XMELLT

  10. XMELLT Approach • Broad view of multi-word expressions • idioms, compounds, collocations, co-occurrence patterns • focus on linking of individual language lexicons • individual words and multi-word expressions • different types of multi-word expressions • e.g., English noun-noun vs Romance noun-PP XMELLT

  11. Considerations • internal variation • sub-categorization properties • idiosyncratic constraints on inflection • meaning (non-)compositionality XMELLT

  12. Encoding Model • Compatible and integrated with existing and de facto standards • e.g., EAGLES, PAROLE/SIMPLE, NOMLEX XMELLT

  13. Activities • Assessment of existing lexical resources for multi-word expressions • Delivery of survey XMELLT

  14. Activities... • Creation of a small set of sample entries • add lexical information on support verb constructions to 50 nouns drawn from NOMLEX for English, Italian, German, and French • create lexical entries for 50 N-N English constructs from the PAROLE/SIMPLE lexicons and corresponding constructs in Italian, German, and French XMELLT

  15. Activities... • Develop preliminary specifications for structuring and encoding multi-lingual, multi-word expression lexicons • required linguistic information • harmonized data architecture and encoding format XMELLT

  16. Activities... • Exploration of techniques for automatic acquisition • Months 1-6: Survey of acquisition techniques, typology of MWE • Months 7-12: Design of architecture for MWE acquisition XMELLT

  17. Project information • Start date: June (?) • Web site: • Contact: http://www.cs.vassar.edu/~ide/XMELLT.html Nancy Ide (PI) Department of Computer Science Vassar College ide@cs.vassar.edu XMELLT

More Related