120 likes | 153 Views
A comprehensive guide with recommendations on stylesheets, case studies, software, hardware, and new features for linguistic databases.
E N D
Quotable Quotes :-) • “If you're running Windows and using a scripting language, it’s just all difficult” - Ed Garrett • “In this case, WALS covers too many languages”- Terry Langendoen E-MELD 2004 Linguistic Databases & Best Practice
EMELD 2004 Working Group 6 Report Stylesheets / Case Studies / Software / Hardware Baden Hughes, Ljuba Veselinova, Terry Langendoen, Manuela Noske, Mike Maxwell, Ed Garrett, Lori Levin, Zhenwei Chen, Prashant Nagara, Neil Salmon E-MELD 2004 Linguistic Databases & Best Practice
Stylesheet Recommendations #1 • Clarify audience • Programmers ? Refer to authoritative guides • Linguists ? Possibly revise the approach • Define & Refine • Remove unhelpful references eg SGML • Define core concepts eg XML • Annotated Examples • Just use inline commentary in the sources ! • Consistency in Exemplars • Example file, with 6 different renderings based on stylesheets • Natural precursors: “to get your data into this format …” • Missing stylesheets • Interlinear text; Paradigms; Trees; Bibliography • Output Formats: PDF, SVG • Anything for non-Roman script ? E-MELD 2004 Linguistic Databases & Best Practice
Stylesheet Recommendations #2 • Access to real data (not CF engine rendered output, but raw DTDs, schemas and XML data) • Check the validity of instructions • Potential for a service provider model (online validation and stylesheet library) E-MELD 2004 Linguistic Databases & Best Practice
Case Studies Recommendations • Missing Case Studies • Multimodal, particularly video-centric • Systematising random archival collections of legacy data • “Meet the Author/Linguist/X?” • Guided tours based on features of case studies which are pertinent to the user (eg source data format, desired outcomes, project type, software) • Quantification of effort for activities of specific types • Support commentary on case studies E-MELD 2004 Linguistic Databases & Best Practice
Software Recommendations #1 • New functional categories in software catalogue • Require contributor information for review comments • Low on content - many bulk listings are available in structured formats - leverage these to create a larger catalogue • Motivating reviews: contrast the book review model with the incentive for software reviews of substance • Ranking systems are problematic if arbitrary or non-transparent • Contextualisation of • Location: field, office, community use • Audience: linguist, technical support, others E-MELD 2004 Linguistic Databases & Best Practice
Software Recommendations #2 • Disambiguate the open source/open format/proprietary/closed format dichotomy • Consider working format vs archival format distinction in making recommendations • Linking to other thematic, functionally-grounded software surveys • New proposal for software “smaller than an application” - later discussion E-MELD 2004 Linguistic Databases & Best Practice
Hardware Recommendations • Other general sites list and review hardware, linking is a more efficient option • Sites which provide specifically linguistic insight should also be included • Addressing common misconceptions eg the minidisk debacle would be a valuable contribution from EMELD • Including complementary technologies which enable the use of hardware in field linguistics: solar panels, batteries, ziplock bags :-) • Important inclusions: handheld devices, scanners E-MELD 2004 Linguistic Databases & Best Practice
Possible New Features #1 • “Small Tools” • Smaller than an application • Primary concern for data manipulation • Not GUI point and shoot solutions, but scripts, libraries etc • Project Guidance: “So you want to collect language data …” • Last speaker scenario • Non-documentary linguist • Incidental acts by non-linguists • Service Provider Model • Stylesheet library • Data conversion and inclusion • Navigational Enhancements • Where am I ? • Guided tours E-MELD 2004 Linguistic Databases & Best Practice
Possible New Features #2 • Media • Incidental discussion of media needs to be formalised • Interactive Forums • While directories and reviews are a good starting point, active communities may help to engage new users with the site • “How to Systematise Language Data” • Draw on the experience of EMELD team in building the case studies • Workflow Approach • Logical pathways within the School • Decision Tree model E-MELD 2004 Linguistic Databases & Best Practice
General Issues #1 • Which conceptual model best suits the resource model within the School ? • download.com - a data provider ? • dmoz.org - a directory service ? • sourceforge.net - a collaborative repository ? • Leadership ambitions • The “all things to all people” model is inherently inefficient - what is EMELD’s competitive advantage ? • While electronic language documentation is a “niche market”, a complementary approach may be mutually beneficial with other projects • Long-term sustainability • Operationally sustainable both in terms of resources, currency and applicability • How enduring are the methods, data formats and advice anyway (what would EMELD look like if we did it again in 5 years time ?) E-MELD 2004 Linguistic Databases & Best Practice
General Issues #2 • Integration vs Dependency • While LL provides much of the manpower for EMELD efforts at present, the goal of enabling documentary linguists directly needs to return to focus • Perspectives on Best Practice • Top-down best practice: tendency to focus on “best” • Bottom-up best practice: grounded in “practice”, and its improvement • Standardization vs Community Building E-MELD 2004 Linguistic Databases & Best Practice