280 likes | 390 Views
Heritage language learning: A corpus-based inquiry. OLESYA KISSELEV , PORTLAND STATE UNIVERSITY Sixth Heritage Language Research Institute 2012. The plan. Overview of the Russian Flagship Program at Portland State University (RFP at PSU)
E N D
Heritage language learning: A corpus-based inquiry OLESYA KISSELEV, PORTLAND STATE UNIVERSITY Sixth Heritage Language Research Institute 2012
The plan • Overview of the Russian Flagship Program at Portland State University (RFP at PSU) • Introduction into corpus linguistics approach to the study of language • Pilot Russian Learner Corpus of Academic Writing (piRULEC) • a sample of corpus approach to contrastive data-driven study of advanced learners of Russian as a Foreign and as a Heritage language
RFP at PSU strives • to create Superior speakers of Russian “In addition to requiring a great deal of language proficiency, a speaker at a high level of proficiency must also possess academic skills, such as the ability to hypothesize and persuade, and discourse skills that any educated person in the target culture would have required ... and professional background or a profession about which to conduct discourse at this level.” Malone, Rifkin, Christian, & Johnson in "Attaining High Levels of Proficiency: Challenges for Language Education in the United States," JDLS 2 (2004)
RFP student demographic • young adults studying Russian with a goal to achieve a Superior level of proficiency • regularly-admitted students at PSU • most major in fields other than Russian • are at least Intermediate-Mid or higher speakers of Russian (Advanced Track); absolute beginners (Beginner Track) • come from different language backgrounds and language experience: • Russian as a foreign language speakers (domestic, FLL) w/varying learning backgrounds • Heritage speakers of Russian (HLL) of various age of exit, schooling experience, number of classes, usage of Russian outside of classroom, etc.
RFP pedagogical challenges • Relative lack of studies in advanced interlanguage • Deficit of research of RHL • Dearth of teaching materials for advanced levels of Russian • Need to assess the program • Need to assess students’ progress • + readily available data • => corpus of learner Russian
Corpus linguistics and learner corpora • a learner corpus is a collection of authentic texts produced by L2, FL, HL speakers of a language • principled • representative • variable • searchable (requires concordance software such as WordSmith Tools) • annotated eg.ICLE (Granger, 2002) is a corpus of essays on various topics produced by advanced EFL students of English studying in a non-English speaking country, of 11 different L1s
Why investigate learner and heritage corpora? • A good corpus can help • identify patterns of language development • inform theory of language • identify segments of language particularly difficult for FL/L2/HL learners • inform pedagogical practices • assess the effectiveness of pedagogical intervention • assess students language progress
Pilot Russian Learner Corpus of Academic Writing (piRULEC) • 2009, data collection begins • in-class and h/w written assignments • app. 800 texts amounting to app. 250,000 words created by 36 learners (17 FLL + 19 HLL) • text length varies from 40 words up to 2,000 words • more texts are being processed • non-annotated
piRULEC • Types of texts • data is restricted to one register – academic writing; • various formats: essay, summary of an article, critical analysis of text, answers to questions, research paper; • different functions (skills): paraphrase, narration, description, compare/contrast, supported opinion, argumentation, and hypothesizing; • timed/non-timed; • individual work / group work.
piRULEC: header ID and file names • Each text contains a header ID that reflects the corpus design • The name of the file functions similarly • The names allow researchers quickly assemble sub-corpora
Preliminary studies of piRULEC • Designing protocols to assess development of writing complexity and accuracy (2010-2012) • Negative transfer in the writing of proficient learners of Russian: a comparative study of FLL and HLL (2012)
An illustration of a data-driven contrastive study • guiding research questions: • What are persistent difficulties in the language of advanced FLL and HLL writers of Russian? Are there differences between the two groups? • students in their two last years of RFP • 9 FLL • 9 HLL • two sub-corpora, FL (27,448 tokens) and HL (27,559 tokens) • in-class writing • home-work assignments • paragraphs and essays • app. 3,000 words from each student
The study, first step: text statistics • corpus text analizer WordSmith Tools (Scott, 2001) provides text statistics: • words (word tokens), • word types • type/token ratio (TTR) • mean length of a word • number of sentences, paragraphs. • mean length of a sentence / paragraph • etc.
The study: text statistics • Lexical diversity
The study, step 2: a word-list • typos • lexical borrowings: • FLL: виктимизация • HL: оппонирует • word-creation • FLL: попутник, трудолюбимость, пропагандическая работа • HLL : насажденята, ленивость • orthography errors
The study, step 3: concordance lines, participles • Russian participles: • present active, past active, present passive, past passive: F=attributive, • short form: F=predicative, derived from past passive, part of predicate • FLL and HLL use all 5 forms of participles
The study: participle use • Comparison of percentage of participles in written speech used by FLL and HLL in the study and NLL (Russian National Corpus) • The diversity of lexemes used to form participles is higher for HLL than FLL
The study: participles in independent clauses and short participle as a part of predicate
The study: types of errors in participles • Errors are qualitatively different: • FLL: • wrong participle form (n=14) • gender (n=8) • case (n=5) • number (n=4) • odd forms (n=4) • missing comma (n=2) • n / nn (n=2) • HLL: • case (n=2); • odd form (n=3) • missing comma (n=15) • n / nn (n=21);
The study: conclusions • Both groups of learners, FLL and HLL, continue to struggle with participles even at advanced level of lg proficiency. • HLL seem to have certain advantage over the FLL in participle use • lexical diversity • choice of correct form • production of correct form • Both groups might benefit from pedagogical intervention.
Challenges for learner and heritage corpora • very few learner and heritage corpora exist • no parallel corpora • concordance software for “exotic” languages is lacking • taggers for non-standard varieties of language are under-developed • still, the field is developing fast and the results are promising!
Acknowledgements:Many thanks to my tireless colleague and supporter, Anna Yatsenko. THANK YOU! OLESYA KISSELEV:kisselev@pdx.edu