150 likes | 312 Views
Data-Driven South Asian Language Learning. SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University jspayne@psu.edu. Corpus-Based Approaches to L2 Instruction. Traditional Paradigm: present > practice > produce Data-Driven Learning: observe > hypothesize > experiment.
E N D
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University jspayne@psu.edu
Corpus-Based Approaches to L2 Instruction • Traditional Paradigm: present > practice > produce • Data-Driven Learning: observe > hypothesize > experiment
What is a corpus? • "A corpus is a body of text assembled according to explicit design criteria for a specific purpose" (Atkins & Clear, 1992: p.5). • John Sinclair (1991) on the development of the field of Corpus Linguistics: "Thirty years ago when this research started it was considered impossible to process texts of several million words in length. Twenty years ago it was considered marginally possible but lunatic. Ten years ago it was considered quite possible, but still lunatic. Today it is very popular."
Comparing Genre • Different genre of corpora: • Journalistic • Conversational (formal, informal, etc.) • Literary • Scientific • Academic • Second language learner
Analyzing Corpora • Qualitative techniques: • Concordances or keyword-in-context • Quantitative techniques: • Frequencies analysis of individual words and collocations • Lexical diversity • Lexical density
What is Data-driven Learning (DDL)? • Application of tools (concordancers) and techniques from corpus linguistics in the service of language learning. • Concordances as a tool for developing instructional exercises. • Places “raw” linguistic primary source material in the hands of learners - learners as “researchers”. • Learners have the opportunity to discover language rules by themselves.
DDL Examples • Tim John’s DDL website
Research on DDL • Vocabulary acquisition • Vocabulary acquisition improved through the use of concordances (Steven, 1991; Cobb, 1997). • Horst and Cobb 2001 found that of four tools supplied to students for the learning and acquisition of vocabulary (traditional bilingual and monolingual dictionaries, online dictionaries and concordances) , after monolingual dictionaries, use of concordances was more indicative of learning gains. • Writing instruction. • Cobb 2004 used Lextutor to help students correct their own writing errors. Only 8% of students indicated that the concordance had helped them.
Corpus Tool for Data-Driven Learning • KWICionary - http://conic.la.psu.edu/kwic/ • OCAT - http://conic.la.psu.edu/ocat/
Activity • Construct a corpus in your language selecting texts from the Web. • Explore the corpus using the concordance query. • Generate at least one activity that you could use in the classroom this summer.
Learner Corpora • Criteria for Learner Corpora: • Continuous stretches of discourse, not isolated sentences or words, containing both erroneous and correct uses of the language. • Resulting from authentic activity – classroom or naturalistic interactions. • Explicit design criteria – meta-data about the learners’ background, setting, level of proficiency, etc.
Learner Corpus Typology • Monolingual <> Bilingual • General <> Technical • Synchronic <> Diachronic • Written <> Spoken
Analyzing Learner Corpora • Contrastive Interlanguage Analysis: • Compares NS and NNS language • Can highlight features of non-nativeness in learner writing and speech (e.g. under and overrepresentation of words, phrases, and structures)
Corpus-Based Assessment of Language Development • Goal: to construct an evolving performance-based linguistic profile for individual learners. • Elements of a learner linguistic profile: • detailed diagnostic analysis of linguistic features: lexical inventory, morphology, syntax, and a variety of discourse properties including coherence and cohesion devices. • It could encompass a comparison between learner performance and native speaker usage and a comparison of learner performance with other learners. • allow for the introduction of a genuinely longitudinal approach to language assessment because it will enable teachers and researchers to track individual learner development over time for any relevant linguistic features.