1 / 14

SIL FieldWorks Language Explorer: The lexicon component

SIL FieldWorks Language Explorer: The lexicon component. Gary Simons SIL International Lexicon Tools and Lexicon Standards Nijmegen, 4 –5 August 2010. SIL FieldWorks. FieldWorks is:

Download Presentation

SIL FieldWorks Language Explorer: The lexicon component

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIL FieldWorks Language Explorer:The lexicon component Gary SimonsSIL InternationalLexicon Tools and Lexicon StandardsNijmegen, 4–5 August 2010

  2. SIL FieldWorks • FieldWorks is: • a suite of integrated software tools to help field workers manage language and cultural data, with support for complex scripts. • http://fieldworks.sil.org/ • The Language Explorer tool is designed to: • manage a lexical database • produce dictionaries • interlinearize texts • analyze morphology

  3. Quick Tour • A short quick tour screen movie demonstrates the look and feel • It is the first of 55 narrated screen movies available at: • http://downloads.sil.org/FieldWorks/Movies/brief demo menu.html

  4. Integration among areas • The Lexicon, Texts, and Grammar areas all operate over the same database. • In the Lexicon area, users enter lexical entries directly. • In the Texts area, as new morphemes are glossed in text, new lexical entries are created behind the scenes. • In the Grammar area, users describe the categories and features used in lexical description, plus the inflectional templates that guide automatic parsing in Texts.

  5. Conceptual-modeling approach • Lexicon, texts, and grammar are all stored in a single, normalized relational database. • We began by working with domain experts to build a conceptual model of the areas and how they integrate. • That was modeled in UML and transformed to a SQL relational database schema. • See the full model with over 100 classes at: http://fieldworks.sil.org/ModelDoc/ModelDocumentation.chm

  6. Some key features • Use automatic parsing to empirically verify morphological description within lexicon • Build the word net via lexical relations • Build richness into the lexicon by eliciting through semantic domains • Use “bulk edit” for global clean up • Repurpose content by developing multiple presentation views • Clean separation between stored data and presentation (see example in next 2 slides)

  7. Root-based dictionary (Cherokee) - Stem entries just cross-refer to root- Root entries list stems as subentries- Subentries give full description

  8. Stem-based dictionary (Cherokee) • Stem entries give full description - Root entries cross-refer to stems - No subentries

  9. Pathways to publishing • First create a “configured view” to display the lexical entries as desired • Then use the Pathway plug-in to take this stream of configured content and lay it out onto pages for a publishable dictionary • http://code.google.com/p/pathway/ • Publishing tools supported so far: • Prince XML (to PDF) • Open Office (to ODF) • Adobe InDesign

  10. Lexical interchange • Supports two import formats: • From Shoebox / Toolbox via SFM • “Standard Format Markers” = backslash codes • User configures the mapping of markers to conceptual equivalents in FLEx database • The default mapping is for MDF SFM • From WeSay / Lexique Pro via LIFT • Lexicon Interchange FormaT: an XML application for interchange of lexicons • http://code.google.com/p/lift-standard/

  11. Lexicon export • The entire database for a language project can be dumped to Fieldworks XML • http://fieldworks.sil.org/supportdocs/FieldWorks XML model.doc • The complete lexical database (a subset of the whole project) can be exported to: • LIFT XML • MDF-based SFM (either root- or stem-based) • http://fieldworks.sil.org/supportdocs/Export options in Flex.doc

  12. More lexicon export • Any configured view can be exported to: • A streamlined version of Fieldworks XML • MDF-based SFM • XHTML + CSS for presentation • Furthermore, one can create a Fieldworks XML Template (FXT) to define a custom export format (XML, SFM, plain text) • http://fieldworks.sil.org/supportdocs/FXT export options.doc

  13. Interoperation with GOLD • FLEX is preloaded with a grammatical categories catalog that is based on an early GOLD • http://www.sil.org/computing/fieldworks/flex/categories.html • Similarly, a Morphosyntactic Gloss Assistant is preloaded with morphosyntactic properties from an early GOLD; see p. 10 of: • http://www.sil.org/~simonsg/preprint/FLExParser Preprint.pdf • Thus morphosyntactic information in lexicon and texts is implicitly aligned with GOLD • The remaining step is for us to map to GOLD ids when they are standardized; then we can easily export GOLD ids in LIFT and other XML

  14. Uptake • October 2009: FLEx 3.0 released in Fieldworks 6.0. Free download from: • http://www.sil.org/computing/fieldworks/FW_downloads.htm • 323 members of a reasonably active Google Group (~3,000 messages) • http://groups.google.com/group/flex-list • 185 language projects have registered as users • Over 30 did a 4-day FLEx workshop led by Beth Bryson at InField 2010. Beth will also do a one-day FLEx workshop at ICLDC, Feb 2011.

More Related