1 / 15

Review of DBMS for Linguistic Purposes

Review of DBMS for Linguistic Purposes. Marisa Ferrara & Steven Moran Eastern Michigan University. Project Purpose. Linguists specializing in language documentation confront the problem of how to digitally organize and store their data

tanuja
Download Presentation

Review of DBMS for Linguistic Purposes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review of DBMS for Linguistic Purposes Marisa Ferrara & Steven Moran Eastern Michigan University E-MELD 2004 Linguistic Databases & Best Practice

  2. Project Purpose • Linguists specializing in language documentation confront the problem of how to digitally organize and store their data • Best practice recommends that linguists create an archival copy in text format with XML markup • However, most linguists use database software to create a working format • A review of DBMS for linguistic purposes with best practice in mind has not, to our knowledge, been done E-MELD 2004 Linguistic Databases & Best Practice

  3. Goals • Evaluate database software according to criteria relevant to linguistic documentation • Not to select the best software overall • Ongoing project • Feedback is appreciated on other software and our criteria E-MELD 2004 Linguistic Databases & Best Practice

  4. DBMS and their interfaces Database software • Can be developed explicitly for language data • Shoebox, FIELD • Can be developed for general purposes • FileMaker Pro, MS Access, MySQL • The first type is often an interface built on top of a general purpose DBMS • Both types are used by linguists and will be evaluated according to the same criteria E-MELD 2004 Linguistic Databases & Best Practice

  5. Software • All software was either tested on Windows XP or Mac OSX • Unix software was left out of this evaluation • The software was chosen from our experience with E-MELD as well as recommendations from various field linguists • 14 software applications were chosen for evaluation • Access 2003 • askSam 5.1 • Emdros 1.1.17 • Excel 2003 • eXist 1.0 • FIELD • FileMaker Pro 7 • Kura 2.0-1-2.1.2 • LinguaLinks Workshops • MATES • MySQL 5.0 • PostgreSQL 7.4 • Shoebox 5.0 • Word 2003 E-MELD 2004 Linguistic Databases & Best Practice

  6. Test data • We evaluated each software by inputting original data • Data was from Sisaala Western [SSL] and collected by Steven Moran (2003) • Typological features include • SVO word order • Left-headed NP • Contrasting High/Low tonal system • Data included a 20 entry subset of the lexicon that was archived in an Excel 2002 file with the following fields E-MELD 2004 Linguistic Databases & Best Practice

  7. Problems encountered • Unicode • We had to find a way of either inserting the characters or a function for cutting and pasting • Syllabic Tone • More than one tone mark per lexical entry • Link tables • Morphological breakdown • Verb phrases had to be marked in a consistent manner • Author’s shorthand included common delimiters that posed a problem for importing and exporting in certain formats • Semantic categories • For consistency, link tables should be used • Could be problematic • FIELD worked best, but missing semantic fields can not be added E-MELD 2004 Linguistic Databases & Best Practice

  8. Criteria • We developed three different categories of criteria that we considered essential for databasing linguistic information • General Information • Technical Information • Ability to Handle Linguistic Data • Criteria include but are not limited to those used by other evaluations • BIFoCAL • Open Source Database Software Comparison E-MELD 2004 Linguistic Databases & Best Practice

  9. General Information • Developer/Release Date • Price • Licensing options • Platforms • Other software needed • Help functionality/Tutorial availability • Support E-MELD 2004 Linguistic Databases & Best Practice

  10. Technical Information • Database Type • Pre-defined DB design? • ACID-compliance • Data integrity • Collaborative or single user • Network connection necessary? • SSL Access? • Web accessible? • Programming Interfaces/API • Imports and Exports • XML (best practice) • Other formats E-MELD 2004 Linguistic Databases & Best Practice

  11. Ability to Handle Linguistic Data • Designed exclusively for linguists? • Unicode compatibility (best practice) • Special character input method/ease of input • Search functionality • Allows input and interlinearization of primary text • Ability to link primary text to lexicon • Multi-Dictionary Format • Ability to export lexicon in a presentation format • Ability to export grammar in a presentation format • Audio/Video/Image support • Ability to add missing features • Overall Assessment • Pros • Cons • Recommended for… E-MELD 2004 Linguistic Databases & Best Practice

  12. Not evaluated • We could not evaluate all software we chose due to a variety of reasons • Software that is too technical • Emdros 1.1.17 • User must program an interface • PostgreSQL 7.4 • GUI and CygWin difficult to setup on a Windows machine • Software that is bad practice • MS Word 2003 • Already reviewed by BIFoCAL • askSam 5.1 • Not Unicode compliant • Software still in development or unavailable • Kura 2.0-1-2.1.2 • Still discussing installation with developer • MATES • Unavailable on the web (still in development)? • LinguaLinks Workshop • Will be reviewed shortly E-MELD 2004 Linguistic Databases & Best Practice

  13. Ongoing Commitment • This evaluation of DBMS for linguistic purposes is ongoing • Our evaluation will be linked to the School of Best Practice Toolroom Software reviews • Users can add their opinions to this system • We welcome any suggestions regarding our criteria and other software that should be considered • Email us! E-MELD 2004 Linguistic Databases & Best Practice

  14. Thank you! Any questions? E-MELD 2004 Linguistic Databases & Best Practice

  15. References • Anonymous. 2004. “Open Source Database Software Comparison”. Retrieved June 1, 2004 at • http://www.geocities.com/mailsoftware42/db/ . • BIFoCAL. 2003a. “Software functionality for non-technical users”. Retrieved July 1, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareDims.html . • BIFoCAL. 2003b. “Questions to Help Evaluate Linguistic Tools”. Retrieved July 1, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareQuestions.html . • Buszard-Welcher, Laura. 2003. “Shoebox: A review of it as a tool for digitizing linguistic • data”. Berkeley Initiative for Computer Assisted Linguistics (BIFoCAL). Retrieved June 1, • 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/ShoeboxRev.html . • E-MELD School of Best Practice. 2004. Retrieved July 1, 2004 at • http://www.emeld.org/school . • Engelberg, Miriam. 2000. “Choosing between Microsoft Access and FileMaker Pro”. • Retrieved June 30, 2004 at • http://www.techsoup.org/howto/articlepage.cfm?ArticleId=207&topicid=6 . • Ethnologue. 2004. “Sisaala, Western: a language of Ghana”. Retrieved July 1, 2004 at • http://www.ethnologue.com/show_language.asp?code=SSL . • Frieb, Werner. 2003. “XML Databases compared”. Retrieved June 21, 2004 at • http://www.studierstube.org/world/xml_databases_compared.html . • Good, Jeff. 2003. “Microsoft Word: A Review of it as a tool for digitizing linguistic data”. • Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/WordRev.html . • Holub, Martin and Pavel Míka. 2001. “MATES – an experimental linguistic database system”. • Proceedings of the IRCS Workshop on Linguistic Databases. Retrieved online June 1, 2004 • at http://www.ldc.upenn.edu/annotation/database/papers/Mika_Holub/21.2.mika.pdf . • Nerbonne, John. 1998. “Introduction to John Nerbonne (ed.) Linguistic Databases”. • Stanford: CSLI. 1-12. Retrieved June 1, 2004 at • http://odur.let.rug.nl/~nerbonne/papers/intro-db.pdf . • Rempt, Boudewijn. 2002. “Kura”. Retrieved June 1, 2004 at • http://www.ats.lmu.de/kura/manual.pdf . • Sprouse, Ronald. 2003. “Filemaker Pro: A review of it as a tool for digitizing linguistic data”. • Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/FileMakerRev.html . • Walkenback, John. 2004b. “Excel 2003 Review”. Retrieved July 1, 2004 at • http://www.j-walk.com/ss/excel/xl2003.htm . • Webopedia. 2004a. “Database Management System”. Retrieved July 6, 2004 at • http://www.webopedia.com/TERM/D/database_management_system_DBMS.html . • Webopedia. 2004b. “SSL”. Retrieved July 6, 2004 at • http://www.webopedia.com/TERM/S/SSL.html . E-MELD 2004 Linguistic Databases & Best Practice

More Related