150 likes | 378 Views
SIL FieldWorks Language Explorer: The lexicon component. Gary Simons SIL International Lexicon Tools and Lexicon Standards Nijmegen, 4 –5 August 2010. SIL FieldWorks. FieldWorks is:
E N D
SIL FieldWorks Language Explorer:The lexicon component Gary SimonsSIL InternationalLexicon Tools and Lexicon StandardsNijmegen, 4–5 August 2010
SIL FieldWorks • FieldWorks is: • a suite of integrated software tools to help field workers manage language and cultural data, with support for complex scripts. • http://fieldworks.sil.org/ • The Language Explorer tool is designed to: • manage a lexical database • produce dictionaries • interlinearize texts • analyze morphology
Quick Tour • A short quick tour screen movie demonstrates the look and feel • It is the first of 55 narrated screen movies available at: • http://downloads.sil.org/FieldWorks/Movies/brief demo menu.html
Integration among areas • The Lexicon, Texts, and Grammar areas all operate over the same database. • In the Lexicon area, users enter lexical entries directly. • In the Texts area, as new morphemes are glossed in text, new lexical entries are created behind the scenes. • In the Grammar area, users describe the categories and features used in lexical description, plus the inflectional templates that guide automatic parsing in Texts.
Conceptual-modeling approach • Lexicon, texts, and grammar are all stored in a single, normalized relational database. • We began by working with domain experts to build a conceptual model of the areas and how they integrate. • That was modeled in UML and transformed to a SQL relational database schema. • See the full model with over 100 classes at: http://fieldworks.sil.org/ModelDoc/ModelDocumentation.chm
Some key features • Use automatic parsing to empirically verify morphological description within lexicon • Build the word net via lexical relations • Build richness into the lexicon by eliciting through semantic domains • Use “bulk edit” for global clean up • Repurpose content by developing multiple presentation views • Clean separation between stored data and presentation (see example in next 2 slides)
Root-based dictionary (Cherokee) - Stem entries just cross-refer to root- Root entries list stems as subentries- Subentries give full description
Stem-based dictionary (Cherokee) • Stem entries give full description - Root entries cross-refer to stems - No subentries
Pathways to publishing • First create a “configured view” to display the lexical entries as desired • Then use the Pathway plug-in to take this stream of configured content and lay it out onto pages for a publishable dictionary • http://code.google.com/p/pathway/ • Publishing tools supported so far: • Prince XML (to PDF) • Open Office (to ODF) • Adobe InDesign
Lexical interchange • Supports two import formats: • From Shoebox / Toolbox via SFM • “Standard Format Markers” = backslash codes • User configures the mapping of markers to conceptual equivalents in FLEx database • The default mapping is for MDF SFM • From WeSay / Lexique Pro via LIFT • Lexicon Interchange FormaT: an XML application for interchange of lexicons • http://code.google.com/p/lift-standard/
Lexicon export • The entire database for a language project can be dumped to Fieldworks XML • http://fieldworks.sil.org/supportdocs/FieldWorks XML model.doc • The complete lexical database (a subset of the whole project) can be exported to: • LIFT XML • MDF-based SFM (either root- or stem-based) • http://fieldworks.sil.org/supportdocs/Export options in Flex.doc
More lexicon export • Any configured view can be exported to: • A streamlined version of Fieldworks XML • MDF-based SFM • XHTML + CSS for presentation • Furthermore, one can create a Fieldworks XML Template (FXT) to define a custom export format (XML, SFM, plain text) • http://fieldworks.sil.org/supportdocs/FXT export options.doc
Interoperation with GOLD • FLEX is preloaded with a grammatical categories catalog that is based on an early GOLD • http://www.sil.org/computing/fieldworks/flex/categories.html • Similarly, a Morphosyntactic Gloss Assistant is preloaded with morphosyntactic properties from an early GOLD; see p. 10 of: • http://www.sil.org/~simonsg/preprint/FLExParser Preprint.pdf • Thus morphosyntactic information in lexicon and texts is implicitly aligned with GOLD • The remaining step is for us to map to GOLD ids when they are standardized; then we can easily export GOLD ids in LIFT and other XML
Uptake • October 2009: FLEx 3.0 released in Fieldworks 6.0. Free download from: • http://www.sil.org/computing/fieldworks/FW_downloads.htm • 323 members of a reasonably active Google Group (~3,000 messages) • http://groups.google.com/group/flex-list • 185 language projects have registered as users • Over 30 did a 4-day FLEx workshop led by Beth Bryson at InField 2010. Beth will also do a one-day FLEx workshop at ICLDC, Feb 2011.