1 / 25

The E-MELD Project

The E-MELD Project. Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U. E-MELD. Electronic Metastructure for Endangered Languages Documentation. 5 year NSF project, 2001-6 Linguist List, ELF, LDC Goal: To aid in

yeager
Download Presentation

The E-MELD Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U University of Texas at Austin

  2. E-MELD Electronic Metastructure for Endangered Languages Documentation • 5 year NSF project, 2001-6 • Linguist List, ELF, LDC • Goal: To aid in • …the preservation of endangered languages data • …the development of infrastructure for electronic archives University of Texas at Austin

  3. Summary of the problem (2001): EL resources were/are • Difficult to find • Difficult to use • Difficult to preserve Needed: • More uniformity in naming, cataloguing, annotating, i.e., interoperable standards • More knowledge of how to create digital resources that last University of Texas at Austin

  4. Problems with EL resources • Difficult to find • At distributed sites • Language names ambiguous • No central catalog of resources or cataloging information (metadata) • Lack of interoperability among archives • Difficult to display accurately • Idiosyncratic character encoding • Specific fonts needed University of Texas at Austin

  5. Problems with EL resources, 2 • Difficult to compare • Non-standard terminology • Idiosyncratic markup & annotation schemes • Difficult to manipulate or reuse • Specific software needed (incl. specific software version), e.g. MSWord 1.0 • Meaning represented via formatting, which was not documented • bold represents “headword” University of Texas at Austin

  6. Problems with EL resources, 3 Impermanent—vulnerable to: • Deterioration of the physical media • Hardware obsolescence • Software obsolescence University of Texas at Austin

  7. PHONOGRAMMARCHIV - AUSTRIAN ACADEMY OF SCIENCEslide from Dietrich Schüller, Director University of Texas at Austin

  8. Toward a Solution: E-MELD Components • Involve linguistics community in developing standards • Promote consensus about: • Language Identification • Metadata • Annotation and markup • Teach and facilitate implementation of “best practices” in the creation of digital language documentation University of Texas at Austin

  9. Promoting consensus : annual workshops • 2001, Santa Barbara, CA: The Need for Standards • E-MELD 2002, Ann Arbor, MI: Digitizing Lexical Information • E-MELD 2003, Lansing, MI: Digitizing Texts • E-MELD 2004, Detroit, MI: Databases and Best Practice • E-MELD 2005, Cambridge, MA: Linguistic Ontologies & Terminology University of Texas at Austin

  10. 2006 E-MELD Workshop on Digital Language Documentation • Michigan State University • June 20-22, 2006 • In conjunction with the 2006 Summer Meeting of the Linguistic Society of America • Topic: Electronic Archiving and Digital Tools: Current State & Future Directions Please come! University of Texas at Austin

  11. Finding resources: metadata • OLAC metadata standards (subcommunity of OAI) • OLAC search engine on LL site: • http://linguistlist.org/olac • OLAC metadata editor on LL site: • http://linguistlist.org/olac/ore • XSL Stylesheets for transformation / presentation of OLAC metadata • Ethnologue/LL language codes proposed as ISO standard University of Texas at Austin

  12. Using resources: comparing and finding annotation • Ontologies developed (as interlanguage between markups and as search aids) • GOLD: General Ontology for Linguistic Description (morphosyntax) • OPF: Ontology of Phonetic Features (based on Ladefoged & Madison) • ODIN Project: mining interlinear glossed text on the web (Will Lewis et al) University of Texas at Austin

  13. Using resources: Tools • Tools to encourage use of the ontology: • OntoElan: text annotation (modification of MPI’s Elan) • OntoGloss: stand-off annotation tool • FIELD: lexical input • Tool to encourage use of Unicode • CharWrite: input of Unicode characters • Facility to encourage use of OLAC metadata • Stylesheet library • ORE University of Texas at Austin

  14. Facilitating ‘Best Practices’ in resource creation • Creation of reference website • School of Best Practices in Digital Language Documentation • http://emeld.org/school/ • Addressed to the individual linguist who creates language documentation University of Texas at Austin

  15. What should the linguist do? • To ensure that digital data endure long into the future: • Create an archival copy: Put the materials into an enduring file format. • Deposit the materials with an archive that will make a practice of periodically migrating them to new storage media as needed. University of Texas at Austin

  16. Organization of the School • Entrance Hall: orientation • Classroom: lessons & tutorials • Reading Room: bibliography • Work Room: online work • Tool Room: links to tools • Help (incl. Ask an Expert) • Case Studies: documentation of 10 ELs digitized according to best practices University of Texas at Austin

  17. Currently School has: Documentation from 12 ELs: University of Texas at Austin

  18. Current Initiatives • Identify and record metadata for legacy documentation • Improve the ontology (GOLD) – incorporate suggestions from 2005 E-MELD workshop • Finish prototyped software University of Texas at Austin

  19. Future: finish prototyped software • OntoElan: ontology-aware modification of MPI’s Elan annotation tool • OntoGloss: ontology-aware stand-off annotation tool • CharWrite: downloadable tool for web-input of Unicode characters • FIELD: Field Input Environment for Linguistic Data • All but OntoGloss available through the School of Best Practices website University of Texas at Austin

  20. Current Initiatives: School of BP • Make the School even more practical • Distinguish between good, better, best practice • Emphasize • explicit ‘how-to’ pages • Different paths for different user types • Advice from experts, e.g. “equipment on a budget” page, Ask-An-Expert University of Texas at Austin

  21. Practices in resource creation • Good practice: ensure preservation • Better practice: ensure longterm intelligibility • “We don’t want to create another Rosetta Stone” - Whalen, 2003 • Best practice: promote interoperability University of Texas at Austin

  22. School of Best Practices in Digital Language Documentation http://emeld.org/school/ University of Texas at Austin

  23. Future Directions • MultiTree • LL-MAP University of Texas at Austin

  24. What is MultiTree? • 3-year grant • Database of all hypothesized language relations • Ultimately linked to GIS database • Interface to allow linguists to input updates • Panel of experts to assess input University of Texas at Austin

  25. LL-MAP • Collect geographically linked linguistic data • Build this into a GIS system, allowing layers of information to be built into a single map Then… • Build tools for querying, annotating and discussing this data • Build tools which allow new language data from linguists and anthropologists to be incorporated into this system University of Texas at Austin

More Related