290 likes | 304 Views
Report on the Multilingual ISLE Lexical Entry (MILE) meeting discussing modularity, methodology, advantages, macrostructure, microstructure, multilingual apparatus, and transfer-based solutions.
E N D
REPORT onComputational Lexicon Working Groupon Multilingual Lexicon EU -WG Meeting December 1st-2nd 2000 Pisa UPenn, December 11 2000
The Multilingual ISLE Lexical Entry(MILE) General methodological principles (from EAGLES): • Basic requirements for the MILE: • Modular and layered • Granular • Allow for underspecification • ISLE should discover and list (the maximal set of) basic notions to be included in the MILE • The leading principle for the design of the MILE should be the edited union of existing lexicons / models (redundancy should not be a problem)
MILE • Objective: definition of MILE, its basic notions, architecture, • such that we can write a DTD • & have a tool to support it • discover a methodology of work towards this
Modularity in MILE • Some advantages: • Flexibility of representation • Easy to customise andupdate • Easy integration of existing resources • High versatility towards different applications • Modularity at least under three respects: • in the macrostructure and general architecture of the MILE • in the microstructure of the MILE • in the specific microstructure of the MILE word-sense
Modularity in MILE • Modularity in the macrostructure and general architecture of the MILE Meta-information - versioning of the lexicon, languages, updates, status, project, origin, etc. (see e.g. OLIF, GENELEX) Possible architecture(s) of multilingual lexicon(s) - interactions of the different modules within the general structure. Issues related to transfer-based, interlingua-based approaches, and hybrid solutions.
Modularity in MILE • Modularity in the microstructure of the MILE – The MILE could be organized in at least the following modules: Monolingual linguistic representation Collocational information Multilingual apparatus (e.g. transfer conditions and actions)
Monolingual Linguistic Representation • It includes the morphosyntactic, syntactic, and semantic information characterizing the MILE in a certain source language. • It possibly corresponds to the typology of information contained in existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet (EWN), COMLEX, FrameNet, etc.
Monolingual Linguistic Representation: a Provisional List • Morphological layer • Grammatical category and subcategory • Gender, number, person, mood • Inflectional class • Modifications of the lemma • Mass/count, 'pluralia tantum' • …
Monolingual Linguistic Representation: a Provisional List • Syntactic layer • Idiosyncratic behaviour with respect to specific syntactic rules (passivisation, middle, etc.) • Attributive vs. predicative function, gradability • List of syntactic positions forming subcategorization frames • Syntactic constraints and properties of the possible 'slot filler' • Morphosyntactic and/or lexical features (agreement, auxiliary, prepositions and particles introducing clausal complements) • Information on control (subject control, object control, etc.) and raising properties • …
Monolingual Linguistic Representation: a Provisional List • Semantic layer • Characterization of senses through links to an Ontology • Domain information, gloss • Argument structure, semantic roles, selectional preferences • Event type for verbs, to characterize their actionality behaviour • Link to the syntactic realization of the arguments • Basic semantic relations between word senses (synonymy / synset, hyponymy, meronymy, etc.) • Semantic/world-knowledge relations among word senses (such as EWN relations and SIMPLE Qualia Structure) • Information about regular polisemous alternation • Information concerning cross-part of speech relations • ….
Collocational Information More or less typical and/or fixed syntactic-semantic patterns • Typical or idiosyncratic syntactic constructions • Typical collocates • Support verb construction • Phraseological or multiwords constructions • Compounds (e.g. noun-noun, noun-PP, adjective noun, etc.) • Corpus-driven examples of MILE • …
Multilingual Apparatus Transfer conditions and actions • possible starting points: OLIF, GENELEX, etc. • devise possible cases of problematic transfer (cf. e.g. the list of linguistic phenomena circulated) • identify which conditions must be expressible and which transformation actions are necessary • select which types of information these conditions must access • examine the variability in granularity needed when translating in different languages, and the architectural implications of this • which role for an Interlingua?
Modularity in MILE • Modularity in the specific microstructure of the MILE word-sense • Word-senses are the basic units at the multilingual level • Senses should also have a modular structure • Coarse-grained (general purpose) characterisation in terms of prototypical properties, captured by the formal means in (B.1) • Fine-grained (domain or text dependent) characterisation mostly in terms of collocational/syntagmatic properties (B.2) (particularly useful for specific tasks, such as WSD and translation)
Meta-information Architecture 1. Coarse-grained 2. Fine-grained 1. Monolingual 2. Collocational 3. Multilingual A. MILE Macrostructure MILE C. Word-Sense Microstructure B. MILE Microstructure
Monolingual Linguistic Representation A strategy: • consider as the starting point for MILE the edited union of the basic notions represented in the existing syntactic/semantic lexicons (their models) • evaluate their notions wrtEAGLES recommendations for syntax and semantics • evaluate their usefulness & adequacy for multilingual tasks • evaluate integrability of their notions in a unitary MILE • look for deficient areas. To be decided: should ISLE reach a consensus at the level of the “types” of information only, or also at the level of their “token” values?
Collocational Information • Open issues: • what is relevant • what can be generalised and formally characterised • what must be simply listed (but even lists may be partially categorised) • what type of representation and analysis to be provided of these phenomena (e.g. a Mel'cuk style analysis for support verb constructions, FrameNet style description of syntactic-semantic “constructions”, etc.)
Agreed Principles • MILEincorporates previous recommendations: is the “complete” entry (to be evaluated wrt usefulness & adequacy for multilingual tasks) • MILEbuilds on the monolingual entry & expands it (at least) with an additional module where correspondences betw. languages are defined We consider 2 broad categories of applications • translation • CLIR (linking module may be simpler) (label info types wrt application)
Paths to discover Basic Notions of MILE • Clues in dictionaries to decide on target equivalent • Guidelines for lexicographers • Clues (to disambiguate/translate) in corpus concordances • Lexical requirements from various types of transfer conditions and actions in MT systems • Lexical requirements from interlingua-based systems • Examined guidelines for bilingual dictionaries provided by SA
Classification of Basic Notions of MILE • For all the notions: • notion already in previous work (Eagles/ Parole/ Simple/ EWN/ Comlex/ Framenet/…) • evaluate if the existing specs are adequate • draw a list of “not yet recommended/adopted” notions: • method of work • priorities • for which applications • assign tasks • need of further development
Organisational Proposal • Start from available EAGLESrecommendations, e.g. as instantiated in Parole/Simple • adopt as starting point the P/SDTD, to be revised & augmented • see Barcelona tool • Evaluate if we can combine in a “hybrid super-model” the transfer & interlingua approaches
Organisational Proposal The tasks should lead to: • Select a list of critical information types that will compose each module of the MILE • Start an in-depth analysis of each of these areas aiming at identifying: • The most stable solutions adopted in the community • Linguistic specifications and criteria • Possible representational solutions, their compatibility, etc. • An evaluation of their respective weight/importance in a multilingual lexicon (towards a layered approach to recommendations) • Identify the open issues and the current boundaries of the state of the art (which cannot be standardised yet) • …..
Semantic relations • Typology (e.g. hyponymy, meronymy, etc.) • Available tests • Representational format(s) • Applicative constraints and needs • Expressive limits • Open issues Argument structure • How to represent it (e.g. frames, a selection of theta-roles, e.g.) • Typology of arguments • Representational problems • Applicative constraints and needs • Linking with syntax (how to express it) • Open issues Information Types
Selectional preferences • How to represent them (e.g. features, reference to an ontology, word-senses, etc.) • Different status of the preferences • Criteria to identify them • Expressive limits of existing formal resources MultiWords Expressions • Typology • How to represent the “internal” structure of MWEs (e.g. Mel’cuk relations, etc.) • Encoding criteria • Application needs and biases • Open issues Modification relations • Types of modifiers • Representational issues • Open issues Information Types
Collocational Patterns • Typology • How to represent them • Interaction with selectional preferences Ontology • Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.) • Inheritance • Which roles for ontologies in the MILE • Representational issues • Customisation and development criteria • Limits Transfer conditions and actions • Identification of categories of transfer phenomena • Ranking of hard cases • Possible parameterisation wrt language types • How to formalise them • Types of actions Information Types
Organisational Proposal • Highlighted some hot issues & assigned tasks: • sense indicators (Issco) • selection preferences (Thurmair) • argument structure (US?….) • MWE (Pisa) • modifiers (Jock) • semantic relations (Piek?) • transfer conditions (…) • collocational patterns (…) • ontology (…) • ….
Organisational Proposal • Ask to Americans, e.g.: • evaluate existing EAGLES etc. recommendations wrt usefulness, coverage, adequacy,… • analyse some of the above info types • look at other languages (Japanese, Chinese, Korean, …) for transfer conditions • look at transfer-based MT systems • look at interlingua MT systems (e.g. Mikrokosmos): additional info types? • … Meeting together US & EU, e.g. end February, beg. March?
DIET Tool • From ISSCO: • for text annotation (of test suites for semantic annotation) • to be used for evaluation purposes • …. • … • ...
Survey: List of Received Materials
Others Surveys Expected • Surveys from US? • Microsoft • IBM • CMU • NMSU • ISI • Systran • Logos