230 likes | 615 Views
18 th International Congress of Linguists 21 July 2008. A Web-Accessible Dictionary of Southeastern Pomo. Charles B. Chang, Shira Katseff, Russell Lee-Goldman, Marta Piqueras-Brunet, and Yao Yao. University of California, Berkeley. Outline. Background Dictionary structure
E N D
18th International Congress of Linguists 21 July 2008 A Web-Accessible Dictionary of Southeastern Pomo Charles B. Chang, Shira Katseff, Russell Lee-Goldman, Marta Piqueras-Brunet, and Yao Yao University of California, Berkeley
Outline • Background • Dictionary structure • Behind the scenes: mySQL, XML, XSL • Searches: words, audio, texts • Future work
Background: Southeastern Pomo (SEP) • Haynie (2007) • Southeastern Pomo (Northern Hokan, Pomoan) is an acutely endangered language historically spoken in the area around Clear Lake, CA (Moshinsky 1974, Gordon 2005). • Speakers/learners are mostly affiliated with the Elem Pomo Tribe. • Only two fluent speakers remain.
Background: Southeastern Pomo (SEP) • Revitalization efforts are underway, led by Loretta Kelsey and Robert Geary (cf. Shavelson 2006, Fagan 2007), and have resulted in: • community orthography • teaching materials • language camps • print dictionary with pictures of flora & fauna • Another component of documentation and revitalization work: a web-accessible dictionary.
Background: online dictionaries • As resources for linguistic analysis have entered cyberspace, so have the products of linguistic documentation (cf. Babel et al. 2006, Dick & Haynes 2006). • Online dictionaries have been created for a number of languages native to the Americas (e.g. Yurok, Hupa, Northern Paiute, Washo). • These online dictionaries make the information gathered through fieldwork on these languages accessible to researchers, tribe members, and in particular, young learners.
Outline • Background • Dictionary structure • Behind the scenes: mySQL, XML, XSL • Searches: words, audio, texts • Future work
Dictionary structure • Three main parts: 1. lexicon • entries for individual affixes, words, and fixed expressions • entry displays the following information about an item: • transcription • part of speech • gloss • links to sound clips in the audio dictionary (if available) • different lexicon entries correspond to different forms, but not necessarily different lexemes 2. audio dictionary • entry for each audio file clipped out for a lexicon entry
Dictionary structure • Three main parts: 3. texts database • entries for elicited sentences, narratives, and other discourse above the sentence level • entry displays the following information about an item: • speaker • genre • transcriptions of individual sentences • free translations • interlinear glosses
Outline • Background • Dictionary structure • Behind the scenes: mySQL, XML, XSL • Searches: words, audio, texts • Future work
Entering data into a mySQL database • Lexicon entries are made in a mySQL database with fields for: • transcription • variants (if any) • community orthography (if available) • part of speech • free gloss & interlinear gloss • semantic domain • source file & start time • links to other morphologically related entries • notes
Entering data into a mySQL database • Desirable features: • automatically generates ID numbers for entries • fully sortable and searchable • allows several researchers to make/edit entries at the same time without overwriting each other’s work • easily exportable to XML (eXtensible Markup Language) format
Displaying XML with XSL • When the database is exported to XML, the data becomes essentially text. • Sample XML for a lexicon entry: <lemma> <id>101</id> <lx>kachuchu</lx> <community_orthography>kuchechoo</community_orthography> <ps>n</ps> <ge>cap</ge> <short-gloss>cap</short-gloss> <ref>21sep06_LK1b</ref> <time>18:39</time> <sd>clothes</sd> <is-headword>yes</is-headword> </lemma>
Displaying XML with XSL • Such a text format allows the data to be easily manipulated into other formats as the technology of documentation changes over time. • A separate XSL (eXtensible Stylesheet Language) file controls how the data from the XML file is displayed. • Summary of what goes on in a dictionary query: QUERY XML XSL DISPLAY
Outline • Background • Dictionary structure • Behind the scenes: mySQL, XML, XSL • Searches: words, audio, texts • Future work
The sounds of Southeastern Pomo • Consonant inventory of SEP: LAB DEN ALV PAL VEL P-VEL GL p p’ b t̪ t̪’ t t’ d k k’ q q’ ʔ ts ts’ (tʃ tʃ’) f s ʃ x χ h m n (ŋ) ɾ l w j
The sounds of Southeastern Pomo • Linguistic orthography of SEP consonants: LAB DEN ALV PAL VEL P-VEL GL p p’ b th th’ t t’ d k k’ q q’ 7 ts ts’ (chch’) f s sh x Xh m n (ng) r l w y
The sounds of Southeastern Pomo • Vowel inventory of SEP: FRONT CENTRAL BACK i (ɪ) u (ʊ) e (ɛ) (ə) o a (ɐ)
The sounds of Southeastern Pomo • Linguistic orthography of SEP vowels: FRONT CENTRAL BACK i u e o a
Sample dictionary queries • Word searches • “What does lq’olq’okin mean?” • “How do you say ‘red’ in SEP?” • “What are some kinship terms in SEP?” • Audio searches • “I want to hear all the words that contain the cluster /mf/.” • Text searches • “I want to see all the contexts in which the word mko ‘see’ appears.”
Outline • Background • Dictionary structure • Behind the scenes: mySQL, XML, XSL • Searches: words, audio, texts • Future work
Future work • In the near future, we hope to: • add different types of multimedia (e.g. photos of local flora & fauna, videos of the actions described by verbs of motion and placement) • have multimedia display within the same window as the lexicon entry • merge the data of the print dictionary with that of the online dictionary • update all entries with their spelling in the Elem orthography • have teachers and learners make use of this as a CALL (Computer-Aided Language Learning) tool
Thank you! Acknowledgements: Jocelyn Ahlers ◆Zhenya Antić ◆ Thera Crane ◆ Donna Fenton Andrew Garrett ◆ Robert Geary ◆ Hannah Haynie Leanne Hinton ◆ Jisup Hong ◆ Loretta Kelsey ◆ Julius Moshinsky Lindsey Newbold ◆ Ronald Sprouse ◆ Maziar Toosarvandani Corey Yoquelet ◆ UCB Linguistics
References Babel, Molly, Andrew Garrett, Erin Haynes, Michael Houser, Reiko Kataoka, Fanny Liu, Nicole Marcus, Ruth Rouvier, Ronald Sprouse, Ange Strom-Weber, and Maziar Toosarvandani. 2006. A web-accessible Mono Lake Paiute dictionary and text archive. Paper presented at the Friends of Uto-Aztecan Conference. Salt Lake City, UT: University of Utah, August 24. Dick, Grace, and Erin Haynes. 2006. A web-accessible Mono Lake Paiute dictionary and text archive. Paper presented at the Great Basin Language Conference. Bishop, CA, October 21. Fagan, Kevin. 2007. Only living Elem Pomo speaker teaches so she won’t be the last. San Francisco Chronicle, September 30. http://www.sfgate.com/cgi-bin/article.cgi?file= /c/a/2007/09/30/MNAISEMAH.DTL. Retrieved 1 July 2008. Gordon, Raymond G., Jr., ed. 2005. Ethnologue: Languages of the World, 15th edition. Dallas, TX: SIL International. Online version: http://www.ethnologue.com. Haynie, Hannah. 2007. Southeastern Pomo. http://hjhaynie.berkeley.edu/ southeasternpomo. Retrieved 5 November 2007. Moshinsky, Julius. 1974. A Grammar of Southeastern Pomo. University of California Publications in Linguistics 72. Berkeley, CA: University of California Press. Shavelson, Lonny. 2006. California tribe tries to save its language. Voice of America News, March 30. http://www.voanews.com/english/archive/2006-03/2006-03-30-voa46. cfm?CFID=88126261&CFTOKEN=81958375. Retrieved 1 October 2006.