590 likes | 1.15k Views
Week 1: Overview. Tools for Language Documentation. Claire Bowern Yale University LSA Summer Institute: 2013. Overview, Goals of Class. Tools for documentation. Physical tools: Hardware Software Stimuli Conceptual tools: What makes a good documentary corpus Procedural tools:
E N D
Week 1: Overview Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013
Tools for documentation • Physical tools: • Hardware • Software • Stimuli • Conceptual tools: • What makes a good documentary corpus • Procedural tools: • How to go about documenting a language • Tools for disseminating results
Overview • Week 1: overview, hardware, software • Week 2: elicitation techniques, grammar writing • Week 3: narratives, conversation, corpus building • Week 4: lexicon, archiving
About the class • “How to describe/document a language” • *No practical component* (in that we won’t be working with speakers) • However, there will be time (I hope!) to talk about your own field data • And we will be doing some exercises with existing data • I will provide datasets for exercises (if you don’t have data of your own to use) • You can also use data from the field methods class here at the Institute.
A few assumptions for this class • Not talking about community-oriented materials here (I see documentary materials as feeding into that though) • Assuming that the language doesn’t have a lot of other materials apart from what the linguist will be producing • Assuming that the linguist will be the one doing most of the writing. • Implicitly assuming a grammar/dictionary/texts model (more on this below). • None of these assumptions are crucial, they’re just there so we can limit the topic a bit.
What is language documentation? • Documentary Linguistics as its own subfield. • Doing things with linguistic data: • Getting the data • Preserving it • Processing it • (Analyzing it) • Cf Woodbury (2002): Language documentation is the creation, annotation, preservation, and dissemination of transparent records of a language. • Important for both theoretical and empirical branches of linguistics: • typology, historical linguistics, etc
What shapes the language record? • The linguist (i.e. you!) • Their interests • Their abilities • The speakers and their interests! • External circumstances • funding • time available • lucky breaks • unlucky breaks
Language Documentation as a Language Legacy • Particularly relevant for endangered languages. • Your work might be the only substantive record of a language: • few speakers • field might view the language as “done” • speakers might view the language as “done”
Planned Documentation vs “Collect it all” • “making a record of the language” : ‘comprehensive grammar’ • You can’t collect everything. • All documentation is sampling. • Unstructured, unanalyzed corpora usually aren’t very useful • They are hard to use; • They don’t get worked on; • They usually aren’t big enough to test hypotheses computationally; • They require native speakers (or people who are already very familiar with the language) -> fine for languages with a major presence, but what about the quarter of the world’s languages with fewer than 10,000 speakers?
What counts as documentation? • When is a collection big enough to count as language documentation? • Is an article in Linguistic Inquiry language documentation? • creation • annotation • preservation • dissemination • but only a very small fragment of a language.
How much time/space does a documentary corpus take? • Depends on the resources: • Time • Speakers • Money • Levels of Interest
Grammar, Dictionary, Texts • “The Boasian Trilogy” • Structure, Lexicon, Culture • Way to present the analysis and also allow others to recreate it (or challenge it) from the underlying data. • Conceived broadly: • Capture language structure • Capture language in use • Capture lexicon and meaning
Sampling: Documentation as snapshots • A big part of documentation is constructing a good set of “samples”. • To do that, you will need to consider what the purpose of the documentary record is. That is, why are you collecting data on the language? • “to make a lasting record of the language” • “to reclaim the language to future speakers” • “to write a reference grammar” • “to document the culture in the traditional language” • “to investigate a particular aspect of the language” • all of the above… • …
Sampling • Are your “snapshots” representative? • Speakers • Subjects/Topics • Grammatical constructions • Lexicon • …
Planned versus opportunistic collection • Planned: • translated sentences. • grammaticality judgments • etc. • Unplanned (or planning gone wrong): • Speakers reinterpret your prompts and construct narratives from them. • New speaker comes to a session and wants to tell stories. • You find a new (to you) morpheme in your data and want to find out how it works. • You overhear a new construction in conversation.
What constitutes a documentary corpus? • ***Everything*** • sound files • videos • transcripts • (elicitation prompts – part of the annotation) • photographs • maps • (artifacts) • metadata (data about the data) • metametadata • …
Workflow: • What do you need to do to document a language? • What order do you need to do it in? • (How will you know if it’s been done right?)
Scaled workflow • Project as a whole (timescale of years) • e.g. “Bardi language documentation” • Immediate tasks (timescale of weeks or months) • e.g. “Bardi learners guide” • Subtasks (timescale of days or weeks) • e.g. “write the section on numbers” • Data gathering (timescale of single session) • e.g. “get data on numerals in use”
Sample field kit: • Equipment: • Laptop • Audio recorder • Video recorder • + microphones • + backup means of recording (e.g. from laptop, second recorder) • Media: • backup devices [hard drive, DVDs, etc] • memory cards for recorders • paper! pens! • Other • ways of keeping the equipment clean • carry bag • stills camera (cell phone, ipad, etc) • batteries, other power equipment • tripod • Stimuli/research prompts
Audio • The field has converged on solid state recorders using SD cards • Handy Zoom H2 or H4 (or H6 coming soon!) • Edirol R-09 • Marantz PMD 660 or 670 • And/or laptops • (or laptop plus external sound card/preprocessor) • small/portable • AA batteries • high quality, lossless formats • easy to use • easy to transfer data
Not recommended: • Dictaphones • Cassette recorders • DAT
Video • Less consensus on models • Major component of the documentation or side-project? • Options: • smart phone • ipad • stills camera with video function • dedicated video camera • SD card • mic jack • Problems: • mpeg vs other proprietary video formats • large files • memory-intensive
Microphones • headset vs lapel vs meeting microphone • dynamic vs cardioid • wired vs wireless • SLR vs 1/8” jack • The built-in mics in the Edirol, Handy, etc, are also ok • You get what you pay for, approximately. • Remember that microphone placement and volume monitoring is much more important than the quality of the microphone (far more recordings are ruined through the former than the latter).
Computer • Laptop • Lots of memory • Lots of hard drive space • Usually don’t need ruggedization features • Get cheapest possible and assume it won’t last for more than a season, or try for a higher end model • Special considerations for high altitude, high humidity, or low temperature work. • High altitude: hard drives fail: use solid state • High humidity: condensation issues • Low temperatures: battery issues (See Lanz 2010)
Tablets? • Most language software won’t run on ipads or other tablets. • Great for stimuli, backup recorder, camera, etc. • Too much data
Sample field kit: • Equipment: • Laptop • Audio recorder • Video recorder • + microphones • + backup means of recording (e.g. from laptop, second recorder) • Media: • backup devices [hard drive, DVDs, etc] • memory cards for recorders • paper! pens! • Other • ways of keeping the equipment clean • carry bag • stills camera (cell phone, ipad, etc) • batteries, other power equipment • tripod • Stimuli/research prompts