150 likes | 224 Views
Review of DBMS for Linguistic Purposes. Marisa Ferrara & Steven Moran Eastern Michigan University. Project Purpose. Linguists specializing in language documentation confront the problem of how to digitally organize and store their data
E N D
Review of DBMS for Linguistic Purposes Marisa Ferrara & Steven Moran Eastern Michigan University E-MELD 2004 Linguistic Databases & Best Practice
Project Purpose • Linguists specializing in language documentation confront the problem of how to digitally organize and store their data • Best practice recommends that linguists create an archival copy in text format with XML markup • However, most linguists use database software to create a working format • A review of DBMS for linguistic purposes with best practice in mind has not, to our knowledge, been done E-MELD 2004 Linguistic Databases & Best Practice
Goals • Evaluate database software according to criteria relevant to linguistic documentation • Not to select the best software overall • Ongoing project • Feedback is appreciated on other software and our criteria E-MELD 2004 Linguistic Databases & Best Practice
DBMS and their interfaces Database software • Can be developed explicitly for language data • Shoebox, FIELD • Can be developed for general purposes • FileMaker Pro, MS Access, MySQL • The first type is often an interface built on top of a general purpose DBMS • Both types are used by linguists and will be evaluated according to the same criteria E-MELD 2004 Linguistic Databases & Best Practice
Software • All software was either tested on Windows XP or Mac OSX • Unix software was left out of this evaluation • The software was chosen from our experience with E-MELD as well as recommendations from various field linguists • 14 software applications were chosen for evaluation • Access 2003 • askSam 5.1 • Emdros 1.1.17 • Excel 2003 • eXist 1.0 • FIELD • FileMaker Pro 7 • Kura 2.0-1-2.1.2 • LinguaLinks Workshops • MATES • MySQL 5.0 • PostgreSQL 7.4 • Shoebox 5.0 • Word 2003 E-MELD 2004 Linguistic Databases & Best Practice
Test data • We evaluated each software by inputting original data • Data was from Sisaala Western [SSL] and collected by Steven Moran (2003) • Typological features include • SVO word order • Left-headed NP • Contrasting High/Low tonal system • Data included a 20 entry subset of the lexicon that was archived in an Excel 2002 file with the following fields E-MELD 2004 Linguistic Databases & Best Practice
Problems encountered • Unicode • We had to find a way of either inserting the characters or a function for cutting and pasting • Syllabic Tone • More than one tone mark per lexical entry • Link tables • Morphological breakdown • Verb phrases had to be marked in a consistent manner • Author’s shorthand included common delimiters that posed a problem for importing and exporting in certain formats • Semantic categories • For consistency, link tables should be used • Could be problematic • FIELD worked best, but missing semantic fields can not be added E-MELD 2004 Linguistic Databases & Best Practice
Criteria • We developed three different categories of criteria that we considered essential for databasing linguistic information • General Information • Technical Information • Ability to Handle Linguistic Data • Criteria include but are not limited to those used by other evaluations • BIFoCAL • Open Source Database Software Comparison E-MELD 2004 Linguistic Databases & Best Practice
General Information • Developer/Release Date • Price • Licensing options • Platforms • Other software needed • Help functionality/Tutorial availability • Support E-MELD 2004 Linguistic Databases & Best Practice
Technical Information • Database Type • Pre-defined DB design? • ACID-compliance • Data integrity • Collaborative or single user • Network connection necessary? • SSL Access? • Web accessible? • Programming Interfaces/API • Imports and Exports • XML (best practice) • Other formats E-MELD 2004 Linguistic Databases & Best Practice
Ability to Handle Linguistic Data • Designed exclusively for linguists? • Unicode compatibility (best practice) • Special character input method/ease of input • Search functionality • Allows input and interlinearization of primary text • Ability to link primary text to lexicon • Multi-Dictionary Format • Ability to export lexicon in a presentation format • Ability to export grammar in a presentation format • Audio/Video/Image support • Ability to add missing features • Overall Assessment • Pros • Cons • Recommended for… E-MELD 2004 Linguistic Databases & Best Practice
Not evaluated • We could not evaluate all software we chose due to a variety of reasons • Software that is too technical • Emdros 1.1.17 • User must program an interface • PostgreSQL 7.4 • GUI and CygWin difficult to setup on a Windows machine • Software that is bad practice • MS Word 2003 • Already reviewed by BIFoCAL • askSam 5.1 • Not Unicode compliant • Software still in development or unavailable • Kura 2.0-1-2.1.2 • Still discussing installation with developer • MATES • Unavailable on the web (still in development)? • LinguaLinks Workshop • Will be reviewed shortly E-MELD 2004 Linguistic Databases & Best Practice
Ongoing Commitment • This evaluation of DBMS for linguistic purposes is ongoing • Our evaluation will be linked to the School of Best Practice Toolroom Software reviews • Users can add their opinions to this system • We welcome any suggestions regarding our criteria and other software that should be considered • Email us! E-MELD 2004 Linguistic Databases & Best Practice
Thank you! Any questions? E-MELD 2004 Linguistic Databases & Best Practice
References • Anonymous. 2004. “Open Source Database Software Comparison”. Retrieved June 1, 2004 at • http://www.geocities.com/mailsoftware42/db/ . • BIFoCAL. 2003a. “Software functionality for non-technical users”. Retrieved July 1, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareDims.html . • BIFoCAL. 2003b. “Questions to Help Evaluate Linguistic Tools”. Retrieved July 1, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareQuestions.html . • Buszard-Welcher, Laura. 2003. “Shoebox: A review of it as a tool for digitizing linguistic • data”. Berkeley Initiative for Computer Assisted Linguistics (BIFoCAL). Retrieved June 1, • 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/ShoeboxRev.html . • E-MELD School of Best Practice. 2004. Retrieved July 1, 2004 at • http://www.emeld.org/school . • Engelberg, Miriam. 2000. “Choosing between Microsoft Access and FileMaker Pro”. • Retrieved June 30, 2004 at • http://www.techsoup.org/howto/articlepage.cfm?ArticleId=207&topicid=6 . • Ethnologue. 2004. “Sisaala, Western: a language of Ghana”. Retrieved July 1, 2004 at • http://www.ethnologue.com/show_language.asp?code=SSL . • Frieb, Werner. 2003. “XML Databases compared”. Retrieved June 21, 2004 at • http://www.studierstube.org/world/xml_databases_compared.html . • Good, Jeff. 2003. “Microsoft Word: A Review of it as a tool for digitizing linguistic data”. • Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/WordRev.html . • Holub, Martin and Pavel Míka. 2001. “MATES – an experimental linguistic database system”. • Proceedings of the IRCS Workshop on Linguistic Databases. Retrieved online June 1, 2004 • at http://www.ldc.upenn.edu/annotation/database/papers/Mika_Holub/21.2.mika.pdf . • Nerbonne, John. 1998. “Introduction to John Nerbonne (ed.) Linguistic Databases”. • Stanford: CSLI. 1-12. Retrieved June 1, 2004 at • http://odur.let.rug.nl/~nerbonne/papers/intro-db.pdf . • Rempt, Boudewijn. 2002. “Kura”. Retrieved June 1, 2004 at • http://www.ats.lmu.de/kura/manual.pdf . • Sprouse, Ronald. 2003. “Filemaker Pro: A review of it as a tool for digitizing linguistic data”. • Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at • http://faust.linguistics.berkeley.edu/~jcgood/bifocal/FileMakerRev.html . • Walkenback, John. 2004b. “Excel 2003 Review”. Retrieved July 1, 2004 at • http://www.j-walk.com/ss/excel/xl2003.htm . • Webopedia. 2004a. “Database Management System”. Retrieved July 6, 2004 at • http://www.webopedia.com/TERM/D/database_management_system_DBMS.html . • Webopedia. 2004b. “SSL”. Retrieved July 6, 2004 at • http://www.webopedia.com/TERM/S/SSL.html . E-MELD 2004 Linguistic Databases & Best Practice