120 likes | 247 Views
Developing OLIF, Version 2. Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany. The Original OLIF. The Open Lexicon Interchange Format. Developed as part of OTELO and Aventinus projects:
E N D
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany
The Original OLIF The Open Lexicon Interchange Format • Developed as part of OTELO and Aventinus projects: • attempt to define common formats and interfaces for different NLP tools, especially MT systems • Aim of OLIF format: • simple, user-friendly vehicle for interfacing with multiple electronic lexical and terminological resources
OLIF Lexicon/Terminology Handling • Grammatical description: • relatively complex to meet needs of MT systems • linguistic analysis must represent common base • Terminology coverage: • adequate to handle basic term exchange • no duplication of well-established term exchange formats, e.g., MARTIF Purpose generate basic, usable NLP-system entry from an OLIF record
OLIF2 Consortium www.olif.net Initiated by SAP in March 2000 • Xerox Lotus SAP Microsoft Trados IBM Logos Sail Labs The EC L10NBRIDGE • Build and improve on OLIF by revising for • XML-compliance • improved language coverage • more comprehensive linguistic analysis
Concertation with SALT • Integrate exchange standards generated by OLIF2 and SALT initiatives XLT Terminology SALT Lexicon: OLIF2
Structure and Content of OLIF2 • Maintains straightforward structure of OLIF: • minimal nesting • features informally grouped based on character of information being represented, e.g., semantic, syntactic, administrative • Supports representation of vital system data, rather than an exhaustive store of features • implies implementation of defaulting strategies on part of vendors using OLIF2
Body of the OLIF2 File • Monolingual entries identified uniquely by: • language • part of speech • canonical form • subject field • semantic reading • Entries may include: • unidirectional, bilingual transfer links • monolingual cross-reference links
Sample OLIF2 Entry <body> <entry> <mono id="1" lang="DE" ptOfSpeech="noun"> <canForm>Briefkurs</canForm> <subjField>gac-fi</subjField> <semReading>meas</semReading> </mono> <transfer target="2" equival="full"> </transfer> </entry> <entry> <mono id="2" lang="EN" ptOfSpeech="noun"> <canForm>bank selling rate</canForm> <subjField>gac-fi</subjField> </mono> </entry> </body>
Improvements • Inflection class patterns for all languages • Expanded syntactic frame analysis • More detailed semantic type hierarchy • Cross-reference options augmented by ISO 12620 categories and EuroWordNet (July, 2000). • Improved syntax for transfer conditions and actions • User guidelines for formulating canonical forms
Transfer Conditions Specifies context in source language for which transfer is valid <transCond> <context>head</context> <transTest> <featTest type="case">d</featTest> </transTest> </transCond> Transfer is valid if the source word is in the dative case
Transfer Actions Action performed in the transfer language based on the context specified for the source <transCond> <context>head</context> <transTest> <featTest type="case">d</featTest> </transTest> <transAct> <addToHead type="prep">for</addToHead> </transAct> </transCond> If the source word is dative, the corresponding target word is the object of the preposition ‘for’
Plans for Completion of OLIF2 • Final specifications February 2001 • DTD February 2001 • Testing April 2001 • Harmonization with SALT April 2001 • Implementation = Import, Export facilities for vendors within consortium 2001