370 likes | 496 Views
LIFT ing LEGO with RELISH : L exicon I nterchange F orma T in Use. Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U. Outline. Background: The RELISH project The LEGO project Project interface Project workflow LIFT: Lexicon Interchange Format
E N D
LIFTingLEGO with RELISH:Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U
Outline • Background: The RELISH project • The LEGO project • Project interface • Project workflow • LIFT: Lexicon Interchange Format • What it is • Use in the LEGO project (“LL-LIFT”) • Sample entry: comparison with LMF-compliant XML output by LEXUS
The RELISH Project • RELISH: Rendering Endangered Lexicons Interoperable through Standards Harmonization • Joint project: U of Frankfurt, MPI-Nijmegen, LINGUIST List • Goal: markup harmonization at two levels • Semantic • Structural • Test cases: 6 lexicons fr. LEGO & Lexus
LEGO & RELISH LEGO Test Lexicons LEXUS Test Lexicons RELISH - LL-LIFT - GOLD - LMF-compliant XML (various) - ISOCats Interchange format: TEI? LIFT?
The LEGO Project • 3-year project sponsored by the NSF • Participants: ILIT (Linguist List) & U at Buffalo • Goal: a “datanet” of interoperable lexicons. • Interoperability based on: • grammatical information mapped to GOLD • structure mapped to a common schema (LL-LIFT) • output in RDF
The LEGO Project • Initial set of lexical resources: • 17 EL lexicons prepared by LINGUIST List: • Shoshone, Archi, Kayardild, Fulfulde, Mocovi, Biao Mien, Potawatomi, Saliba, W. Pantar, W. Sissala, Wichi, … • 3000+ wordlists prepared by U at Buffalo: • Usher-Whitehouse lists, Loanword Typology Lists, Intercontinental Dictionary Series lists. • Extensible: • In practice: to Lexicons in LIFT • Possibly: RDF, TEI (or another official serialization of LMF)
The LEGO Project • Purposes • Not intended to develop a lexicon creation or lexicon display tool • Intended to • support multi-lexicon search and comparison • demonstrate the value of digital standards in linguistic research
LEGO Workflow Descriptive XML LL-LIFT (XML) LEGO db mapping to GOLD LL Lexicon - MS Word Excel Access Toolbox Filemaker LEGO Interoperable Lexicon
LEGO: Search multiple lexicons by LIFT field (Definition, Example, Variant, etc.
LEGO: Search multiple lexicons by grammatical information (GOLD concept).
LIFT • LIFT = Lexicon Interchange Format • XML format for storing and exchange of lexical information • Developed by SIL International • Designed to be easy to convert into and out of MDF and Fieldworks formats • Current version: http://code.google.com/p/lift-standard/downloads/detail?name=lift_13.pdf
Programs that support LIFT • WeSay uses LIFT as its primary format. • FieldWorks Language Explorer (FLEx) can import and export LIFT files. • Lexique Pro can open LIFT documents for viewing, printing, and making web pages. It can also save to LIFT format. (fr. http://code.google.com/p/lift-standard/)
Utilities for LIFT • Solid can convert basic SFM (standard format markers, e.g. Toolbox format) to LIFT (see: http://lingtransoft.info/apps/solid • LiftTweaker Can selectively modify a LIFT file for targeting different audiences
Lexicons in LIFT • LIFT chosen as upload format for LEGO because of the large number of lexicons potentially available in LIFT • About 50 published lexicons in Lexique Pro • 180+ lexicons in Fieldworks Language Explorer (FLEx) ? • 300+ lexicons in Shoebox/Toolbox ? • With the owner’s permission, these could easily be integrated into the LEGO system
Notes • Grammatical Info is attached to Sense, not to Entry or Form (differs from LMF) • Variant is attached to Entry, not ‘sense’ – can’t add a ‘sense’ to a variant • Multiple senses and variants allowed • Highly customizable: Field, Type, and Range can be added to virtually any element (can be defined in the document header)
LL-LIFT • Lack of constraint on use of Field and Trait constituted a problem for LEGO • Developed ‘LL-LIFT’ • a constrained form of LIFT • which still validates against the LIFT schema
LL-LIFT • Major constraints • No Header • Grammatical Information confined to a single element • Delimited within db field • Parsed out during GOLD mapping • Minor entries, comparison forms, etc • separate entries • unified via ‘relation’ element
<entry id="d1e56244"> <trait name="original-id" value="2c9090a22632946601267b22f98e5098"/> <lexical-unit> <form lang="syv"> <text>дадагалзаар</text> </form> </lexical-unit> <variant> <form lang="syv"> <text>dadagalzaar</text> </form> </variant> <sense> <grammatical-info value="n."/> <definition> <form lang="eng"> <text>doubt</text> </form> </definition> </sense> </entry> Tuva entry in LIFT
<LexicalEntry id="2c9090a22632946601267b22f98e5098"><DataCategory type="rank">5155</DataCategory><DataCategory type="lexeme">дадагалзаар</DataCategory><DataCategory type="part of speech">n.</DataCategory><Form id="2c9090a22632946601267b22f9fd50a0" type="form"> <DataCategory type="image"/> <DataCategory type="Audio">tvn_5155.mp3</DataCategory></Form><Sense id="2c9090a22632946601267b22f9fd50a6" type="sense"> <DataCategory type="gloss">doubt</DataCategory> <DataCategory type="transcription">dadagalzaar</DataCategory></Sense><ListOfComponents/> </LexicalEntry> Tuva entry exported from LEXUS (LMF-compliant)
<LexicalEntry id="2…"><DatCat type="rank">5155</DatCat><DatCat type="lexeme"> дадагалзаар</DatCat><DatCattype="part of speech">n. • </ DatCat ><Form id="2…" type="form"> < DatCattype="image"/> < DatCattype="Audio"> tvn_5155.mp3</DatCat></Form><Sense id="2…" type="sense"> < DatCattype="gloss">doubt</DatCat >< DatCattype="transcription"> dadagalzaar</DatCat></Sense><ListOfComponents/> </LexicalEntry> <entry id="d1e56244"> <trait name="original-id” value=“2…"/> <lexical-unit> <form lang="syv"> <text>дадагалзаар</text> </form> </lexical-unit> <variant> <form lang="syv"> <text>dadagalzaar</text> </form> </variant> <sense> <grammatical-info value="n."/> <definition> <form lang="eng"> <text>doubt</text> </form> </definition> </sense> </entry> LMF LL-LIFT
Summary • LL-LIFT = current XML schema used in LEGO • Easy to transform (at least one) LMF-compliant XML output of LEXUS to LL-LIFT • Transformation could be extended to TEI serialization of LMF. May require constraint