1 / 54

Tracking Learning: Using Corpus Linguistics to Assess Language Development

Tracking Learning: Using Corpus Linguistics to Assess Language Development. James Lantolf Steve Thorne CALPER Center for Advanced Language Proficiency Education and Research The Pennsylvania State University. Tracking Learning: Approaches to Assessment. Traditional Classroom Assessment

flavio
Download Presentation

Tracking Learning: Using Corpus Linguistics to Assess Language Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tracking Learning: Using Corpus Linguistics to Assess Language Development James Lantolf Steve Thorne CALPER Center for Advanced Language Proficiency Education and Research The Pennsylvania State University

  2. Tracking Learning: Approaches to Assessment • Traditional Classroom Assessment • Achievement, Placement, Formative • Standardized Tests • AP, TOEFL, OPI, STAMP • Alternative Assessment • Portfolio & LinguaFolio • Performance Assessment, Task-Based • CALPER Assessment • Dynamic Assessment • Corpus-Informed Assessment

  3. Today’s Talk • What is a corpus? • Types of corpora • Corpus-informed assessment • Developmental learner corpora • Contrastive learner corpus analysis against baseline • Two examples of corpus-informed assessment • Advanced ESL academic discourse competence • German modal particles

  4. What is a Corpus? • A corpus (plural corpora) • Large collection of texts • Gathered according to specific criteria • Stored in an electronic database with relevant meta-data associated with each text entry • Student ID • Time/date • Activity type • Corpora can be constructed from written language use (especially digital texts) or transcribed from spoken interaction

  5. Basic Tenets of Corpus Analysis • Data driven, highly empirical • Objective approach • A grammar of use based on attested utterance types • A grammar of probability based on frequency and distribution • Language use and structure: • Collocational patterns • Lexicon heart of systematicity in language, i.e., grammar • Formulaic sequences comprise ~60% of language use (Wray, 2002; Schmitt & Carter, 2004)

  6. Corpora & Language Assessment • For advanced proficiency -- develop and/or utilize genre, modality, and context-specific corpora • Focus can be on grammatical, lexical, metaphoric, discourse, pragmatic features • Typical problems and errors of usage can be found in learner data • Teachers and learners themselves can observe and assess their own and one another’s performance • Expert-speaker corpora can reveal what learners are not using/doing, as well as how appropriately, successfully, and differentially they are using the target language

  7. Elicited performance indicative of competence “Authenticity” and / or ecological validity of test instrument Sampling issues Reliability Critical question: Is the elicited performance representative of the individual’s state of language development? Naturally-occurring language performance indicates competence Volume of language learners produce across tasks/genres and time Sampling issues become irrelevant Reconceptualize reliability Critical question: Have enough data been collected to conclude that an individual’s performance is representative of her state of language development? Comparing Assessment Approaches Testing Corpus-based

  8. ITA Project Describing, assessing, and developing academic discourse withinternational teaching assistants Steve Thorne Jonathan Reinhardt Paula Golombek

  9. ITAcorp Project • ITAs highly competent researchers • Expand repertoire of options for performing often complex social roles (instructor, adjudicator, tutor, advisor, fellow student, mediator) • Assessment --> Contrastive corpus analysis of ITACorp with baseline corpus -- MICASE • Grammar as choice as it relates to meaning and social actions • Formulaic sequences, small words, modulation • Corpus-informed pedagogical intervention to prepare students to participate successfully in spoken and written genres of academic discourse

  10. Methodology • Contrastive corpus analysis of MICASE and ITAcorp --> what are the differences in language use between expert/native and ITA/advanced ESL speakers? • Identified directive and obligative constructions • Quantified usage of directive language in both corpora • The case of wanna / want to

  11. Corpus Assessment: Time 1 • The case of “you want to” | “you could …” • Please + [imperative]

  12. Corpus Assessment: Time 2 • Post intervention usage of “you want to” • 10 instances of usage across 25 advanced ESL students • Concordance lines of proceduralized usage in context

  13. Corpus Assessment Corpus-informed Assessment and Materials Development: German Modal Particles Nina Vyatkina

  14. Teaching the MPs: Challenges • Modal Particles: ja, doch, denn, mal • Rampant polysemy in MPs • Strongly context-bound meaning • Absence of a direct counterpart in English (translated by tag questions, intonation, omitted) • Absence of an informal “particle-friendly climate” in traditional language classrooms • Overly formal treatment in textbooks • Sentence-based rather than utterance-based [interactive]

  15. Participants 7 American students and 16 German students discussing intercultural topics in German and in English using email and chat during 8 semester weeks (Fall 2005)

  16. German Modal Particles • German modal particles: indeclinable “smallwords” typical of conversations • ‘The German listener expects a particle. If it is absent, the sentence acquires a specific stylistic value: without a particle it sounds choppy, harsh, unfriendly, its utterance is apodictic, abrupt, blatantly noncommittal.’ (Weydt, 1969)

  17. Pedagogical intervention • Classroom intracultural sessions: explicit instruction based on the data produced by the participants in • Internet-mediated intercultural sessions: practice in language use in CMC with native speakers (Belz, 2006)

  18. Relative frequency: modality/intervention effect * Statistically significant difference in mean relative frequencies (no. MPs/1000 German words), p<.05

  19. MP Dispersion in the corpus • Learners: • ja • denn • doch • mal • NSs: • ja • denn • doch • mal

  20. MP use by NSs and learners (absolute numbers)

  21. Corpus-informed Assessment: Conclusions, Questions, & Resources • Representativeness and ecological validity? • Assemble corpus data to adequately and significantly represent production • Use benchmark corpora for assessing learner language successes and problems • Developmental corpus assessment of individuals and class-cohorts • CALPER materials: • Corpus tutorial -- see calper.la.psu.edu • INVESTIGATING REAL LANGUAGE -- June 25-27, 2007 • DYNAMIC ASSESSMENT workshop June 25-27, 2007 • CALPER Corpus Tool available Summer, 2007

  22. Thanks -- please visit our website for more information on CALPER materials, events, and services: http://calper.la.psu.edu

  23. Challenges to Corpus Approaches • One data source among many: ethnographic details, visual field, introspection, clinical and experimental elicitation • Descriptive not explanatory • Focus on externalized language use / performance – psycholinguistics and language processing inferred • Corpora are “real” (representation of actual use), but are they “authentic” (meaningful and applicable to learners, e.g., Widdowson, 2002) • Only as good as its representativeness • Harkening back to contrastive error analysis? No, contrastive analysis of actual use that does not need to include incapacity evaluations of learners

  24. Types of Corpora & Analyses Synchronic Diachronic

  25. Corpus Design and Construction Synchronic • Aggregative • Genre, register • Meta-data: • Situational context • Activity • Level of proficiency

  26. Corpus Design and Construction Diachronic • Role of meta-data: • Individual • Task • Time • Corpus construction as a form of experimental research

  27. Corpus Annotation • Frequency and location of tags • Laughter for hyperbole • Language use as social action • Part-of-speech • Lemmatization • Syntactic tagging • Error tagging • Semantic tagging

  28. Corpus Informing Language Theory • Not only what is possible (e.g., nativist and UG approaches), but what is likely or frequent in usage • Illustrates the limits of introspection about language (enormous differences between intuition and actual use) • Language structure, i.e., formulaic sequences comprise ~60% of language use (Wray, 2002; Schmitt & Carter, 2004) • Emergent grammar(Hopper, 2002; Bybee, 2001) • Grammar a consequence, not a precondition -- epiphenomenal • Grammar = observable repetition in discourse • Grammar contingent upon lexical environment • “Grammar contracts as texts expands” --> fragments and repertoires

  29. Revisioning Ellipsis • Speakers add features as necessary rather than as taking away from what would be required in written discourse (see also Wittgenstein, 1953; Rommetveit, 1974) • Omission of auxiliaries is common (be, have, do) but not often from speaker’s or 1st person perspective • Empty “its” and existential “there is” often dropped in spoken discourse • Pronouns before modal verbs e.g., can happen, should be • Overall, beginning bits are left out • Grammatical description SHOULD represent spoken language use, should relate items and structures to interactional and situational functions

  30. Importance of Measuring & Understanding Process • Alfred Binet (1909) advocated process assessment, though never designed an instrument to measure it. • Buckingham (1921) accounting for learning processes as important as products.

  31. Challenges of Assessing Process • Feasibility • “the most direct procedure for determining an individual’s proficiency…would simply be to follow that individual surreptitiously over an extended period of time…It is clearly impossible, or at least highly impractical, to administer a ‘test’ of this type in the language learning situation” (Clark 1978, as quoted in Bachman, 1990). • Scalability - the bane of “alternative” assessment

  32. Depicting Process in SLA • Accuracy of production of L2 forms and IL development suggests a curvilinear rather than a linear relationship (Norris & Ortega, 2003). • Threshold and stage effects (Meisel, Clahsen & Pienemann, 1991). • U-shaped behavior (Kellerman, 1985) • Omega-shaped behavior - temporary increase in frequency followed by a normalization (Wolfe-Quintero, Inagaki,& Kim, 1998).

  33. Using Corpus to Assess IL Development • Addressing feasibility and scalability • Proliferation of technology-mediated language learning • More powerful computers and more refined software. • Automated speech recognition - “dirty” ASR

  34. “Complementary” Assessment • Use testing techniques (traditional or performance) in conjunction with corpus-based assessment to generate a more detailed and broad-based account of IL development.

  35. Academic Discourse Performance • An ITA’s success as instructors and future faculty depends on successful participation in written and spoken academic discourse • e.g. spoken genres: • small lecture presentation • large lecture presentation • discussion leading • lab section leading • seminar leading • advising • colloquia participation • interviewing • meeting participation • office hours conducting • service encounters • tutorial leading • socializing • conference presentation

  36. The ITA “problem” • Jan 2005: North Dakota proposed legislation: bill would have forced universities to reimburse class fees to student complaints about an instructor’s inability in English. If ten percent or more students had complained, the instructor would have been relieved from teaching pending further review. A watered-down version of the bill passed. • High number of international graduate students in the U.S. -- 50 % of US graduate students in engineering and sciences are international

  37. Directive Language • DL is language with directive illocutionary force (Searle, 1979) used functionally for making suggestions or giving advice • In traditional frameworks, DL has primarily deontic qualities of obligative modality • In textbooks, is taught as series of modals & semi-modals (must, mustn’t, have to, should, ought to, need to, needn’t) • In SYS-FUNC, DL would be considered part of the MODULATION system, a continuum between obligation (what I want you to do) and inclination (what you want to do)

  38. Why Study Directive Language? • DL is an important part of several academic discourse genres and professional competence • Inappropriate or unintended use of DL may result in miscommunication or misunderstanding of speaker intention • DL is highly interpersonal, involving speaker authority and power hierarchies

  39. Research • Contrastive genre-comparable spoken corpora • ITAcorp (ITA language use): office hours role plays (CMC, presentation, post-evaluation)-- approx. 120,000 tokens • MICASE (base-line ‘expert’ corpus): Advising and Office Hours sub-corpora--180,000 tokens • MICASE data as model • Analytical framework: • Corpus: usage-based, frequency & distribution • Qualitative: (professional) discourse analysis, SYS-FUNC & APPRAISAL

  40. Preliminary Contrastive Analysis of wanna / want to

  41. You [+ hedge] want to / wanna [+ hedge] ITACorp MICASE • MICASE shows 12x the hedged use of want to / wanna • ITACOrp uses followed pedagogical intervention on hedged wanna DL

  42. Additional Preliminary Descriptive Findings • In comparison to MICASE data, ITAs as represented in ITAcorp: • Generally use very few hedges or intensifiers • Generally under use periphrastic forms • Overuse obligative modals (must, should) and please + imperative • Use ‘can’ for obligative ‘should’ • Use only basic conditional, underuse of ‘youcould’ and no use of ‘I would’ • Navigate between ‘I’ and exclusive ‘we’ strategically, invoking departmental or professorial authority when the going gets tough

  43. Next Steps • Complementary ethnographic data (survey, interviews) for ITAcorp participants • Use audio to produce narrow transcriptions of select data • Focus on differences across modality (CMC vs. F2F) • Focus on classroom presentation of a concept (contrasting with MICASE) • Gather data from non-role play ITA professional activity (section leader, lecturer, office hours) • Develop set of corpus-informed pedagogical interventions focusing on professional discourse competencies

  44. What is Data-Driven Learning? • Application of tools (concordancers) and techniques from corpus linguistics in the service of language learning. • Inquiry-based pedagogy • Learner as researcher • "Research is too important to leave to researchers" (Johns, 1991, p2.)

  45. Paradigms of L2 Instruction • Traditional approaches: Present -> Practice -> Produce • Data-driven learning: Observe -> Hypothesize -> Experiment

  46. Impact of Corpus Techniques on L2 Pedagogy • Materials development • How do native/expert speakers actually use the target language? • What drives sequencing? • Instructional activities • Example - link • Data-driven learning tools • KWICionary - link

  47. Research on Data-Driven Learning • Vocabulary Acquisition: improved through the use of concordances (Steven, 1991; Cobb, 1997) • Writing Instruction: students can correct their own errors with concordances (Gaskell & Cobb, 2004; Ross & Payne, 2005)

  48. Pedagogical Issues for DDL • Learning a new way to learn language • Relationship between proficiency level and data-driven learning approach • Should frequency of use drive materials development?

  49. Next Generation Corpus Tools • Text files => relational databases • Storing data as smallest atomic unit • Associate extensive meta-data with each data entry • Application-based => web-based • Promote aggregation and sharing of data • Location-independent collaborative research • Integration with online learning environments • Online Corpus Analytic Tool (OCAT)

More Related