250 likes | 390 Views
Using OLIF, The Open Lexicon Interchange Format. Susan McCormick OLIF2 Consortium October 1, 2004. The OLIF Format. The Open Lexicon Interchange Format XML-compliant standard Supports exchange of lexical and terminological data for language technology applications
E N D
Using OLIF,The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004
The OLIF Format • The Open Lexicon Interchange Format • XML-compliant standard • Supports exchange of lexical and terminological data for language technology applications • Handles basic exchange as well as more complex applications such as MT lexicons
The OLIF2 Consortium • OLIF v.2 was developed by the OLIF2 Consortium, a group of language technology companies and organizations interested in issues of MT data/term data exchange • Led by SAP • Members include Xerox, Microsoft, Trados, IBM, Systran, IAI, DFKI and Comprendium
Developing OLIF v.2 • Based on OLIF prototype • Developed in EC-funded OTELO project – proposing standards for users of disparate language tools • Original purpose of OLIF was to facilitate terminology exchange for industrial users of MT
Developing OLIF v.2 • Version 2 adapted from OLIF prototype using input from • Developers/users of 3+ MT systems • Developers/users of terminology management systems • Other language standards projects: • EAGLES • SALT • ISLE • MARTIF, TBX
OLIF Version 2 Released as open standard in 2002 • XML-compliant • Covers 6 European languages • English, German, French, Spanish, Danish, Portuguese • Includes options for modeling administrative, morphological, syntactic and semantic data
Available to Users • XML implementation of OLIF specification in a DTD • Available from OLIF2 Consortium web site: www.olif.net
The OLIF File Follows Terminology Markup Framework (TMF) structure: • Header • Body • Shared resources
The OLIF Entry Collection of monolingual data on a specified sense of a word or phrase • Optional links for cross-reference and transfer • Transfer is bilingual and unidirectional • Multiple transfers in multiple languages possible for single word sense
Key Data Categories • The OLIF entry is uniquely identified by 5 key data categories: • Canonical form • Language • Part of speech • Subject field • Semantic reading
Basic Well-Formed OLIF Entry • <entry> • <mono> • <keyDC> • <canForm>table</canForm> • <language>en</language> • <ptOfSpeech>noun</ptOfSpeech> • <subjField>general</subjField> • <semReading>86</semReading> • </keyDC> • </mono> • </entry>
<entry> <mono> <keyDC> <canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading> </keyDC> <monoDC> </monoDC> </mono> </entry> • <monoAdmin> • <originator>Weber</originator> • <adminStatus>ver</adminStatus> • </monoAdmin> • <monoMorph> • <inflection>like book,books</inflection> • </monoMorph> • <monoSyn> • <synType>cnt</synType> • <synFrame>[gencomp-opt]</synFrame> • </monoSyn> • <monoSem> • <semType>inform</semType> • </monoSem>
OLIF Entry with Cross-Reference <entry> <mono> <keyDC> <canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading> </keyDC> </mono> </entry> • <crossRefer> • <keyDC> • <canForm>row</canForm> • <language>en</language> • <ptOfSpeech>noun</ptOfSpeech> • <subjField>general</subjField> • <semReading>69</semReading> • </keyDC> • <crLinkType>has-meronym</crLinkType> • </crossRefer>
OLIF Entry with Transfer <entry> <mono> <keyDC> <canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading> </keyDC> </mono> </entry> • <transfer> • <keyDC> • <canForm>Tabelle</canForm> • <language>de</language> • <ptOfSpeech>noun</ptOfSpeech> • <subjField>general</subjField> • <semReading>86</semReading> • </keyDC> • </transfer>
Data Category Values • Allowed values specified by OLIF • Administrative, terminological, linguistic values based on • General industry standards • E.g., allowed values for date derived from recommendations from ISO 8601:1988 • MT/Terminology standards • E.g., suggested values for subject field adapted from EC • Widely-recognized linguistic standards • E.g., allowed values for gender based on longstanding gender description for European languages
User Extensions: The OLIF Data Category Registry • Users may declare and use their own values for certain data categories: • Subject field • Semantic reading • Morphological structure • Part of speech • Inflection • Aspect • Syntactic type • Syntactic frame • Semantic type • Concept hierarchy
Organizing Based on Concept • Users may link monolingual entries via a concept identifier • These IDs can be used to organize entries as equivalent word senses associated with the same concepts rather than source word senses associated with transfers.
Entries Linked by Concept <entry ConceptUserId= ”0731F16CCCD2D3119B4D”> <mono> <keyDC> <canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading> </keyDC> </mono> </entry> • <entry ConceptUserId= • ”0731F16CCCD2D3119B4D”> • <mono> • <keyDC> • <canForm>Tabelle</canForm> • <language>de</language> • <ptOfSpeech>noun</ptOfSpeech> • <subjField>general</subjField> • <semReading>86</semReading> • </keyDC> • </mono> • </entry>
What’s Available to the OLIF User? • On www.olif.net • Complete XML DTD for download • Hyperlinked DTD for viewing • Graphical view of structure of DTD • Current specification for OLIF v.2 • Formalization of OLIF data categories • Alphabetic list of XML elements and attributes • Fixed and recommended values for elements and attributes • Guidelines for formulating canonical forms • Sample OLIF entries
Using OLIF • Some applications: • SAP has implemented an OLIF converter to exchange terminological data from its central termbase SAPterm • MT developers in OLIF2 Consortium currently developing OLIF converters (Comprendium, Systran) • OLIF User Forum = 60+ members
What’s New: XML Schema OLIF XSD offers • 40+ built-in data types • Allows creation of user-defined data types • Supports inheritance
What’s New: The OLIF API • Based on OLIF XSD, Java classes created • Supports: • Converting .csv files to OLIF • Converting from XML format to OLIF • Creating OLIF documents from scratch • Modifying OLIF documents
What to Expect this Year from OLIF • OLIF XSD and API are available to the user from www.olif.net • OLIF web site upgraded, updated • Requirements for modeling Japanese entries integrated
OLIF User Forum • Users of OLIF can access and post questions, messages and sample data from the OLIF group site: http://groups.yahoo.com/group/olifConsortium/