670 likes | 984 Views
The 2006 International Symposium of Computer-Assisted Language Learning June 2-4, 2006, Beijing. Chinese learner corpora and second language research. Qiufang Wen The national research center for foreign language education, BFSU. Topics to be addressed. English corpora of Chinese learners
E N D
The 2006 International Symposium of Computer-Assisted Language Learning June 2-4, 2006, Beijing Chinese learner corpora and second language research Qiufang Wen The national research center for foreign language education, BFSU
Topics to be addressed • English corpora of Chinese learners • Corpus-based studies on English learners in mainland China • Several corpus-based studies on English learners’ interlanguage by myself or together with my colleauges • Advantages and disadvantages of corpus-based studies on the interlanguage
Topic One English corpora of Chinese learners
Chinese learner English Corpus (CLEC) • College Learners’ Spoken English Corpus (COLSEC) • Spoken and Written Corpus of Chinese Learners (SWECCL) • Version 1 • Version 2 (under construction) • Bilingual Corpus of Chinese English Learners (BICCEL): under construction
1. Chinese learner English Corpus (CLEC) by Gui & Yang in 2003 • Written corpus: 1 million • Timed and untimed compositions • Levels of proficiency • Middle school students • Non-English major (Band 4) • Non-English major (Band 6) • English majors (Band 4 ) • English majors (Band 8) • Error-tagged
Two Types of English Learners in University English Majors Non-English majors Year 4 Year 3 Year 2 Year 1 Year 4 Year 3 Year 2 Year 1 Band 8 Band 6 Band 4 Band 4 Band 2
2. College Learners’ Spoken English Corpus (COLSEC) by Yang & Wei in 2005 • Tokens: 0.7million • Source: National spoken English test for non-English majors • Test items • Teacher-student conversation • Student-student discussion • teacher-student discussion • Data format: written transcripts
3. Spoken and Written Corpus of Chinese Learners (SWECCL) by Wen, Wang & Liang in 2005 (Version 1) SWECCL WECCL SECCL 1.46 million 1.18 million
Spoken (SECCL) • Source of data • National spoken English test: 1996-2002 • Second-year English majors • Data format • Digital sounds as well as transcripts of the speeches
National spoken English test for English majors — Band 4 • Test format • Test in a lab • The number of testees annually • 2006: more than 16,000 • Expect to have 50,000 in the future • Scoring procedures • A random sample (30-35 tapes) • Two raters scoring one tape independently
Number of subjects • 6 groups from each year (1996-2002) • 42 groups (30/35) = about 1400 students • About 230 hours’s speech • Testing items
The structure of SECCL Tagged Article Past Tense Text Special Whole Task A SECCL Raw Task Task B Task C Year Sound files (1996-2002)
The written component Written Year 1 Year 2 Year 3 Year 4
The written component • Source of data • Timed compositions in class (40 minutes, no less than 300 words) • Take-home compositions (no word limit) • Types of compositions • Argumentative (a list of topics provided) • Narrative
SWECCL in 2007 (Version 2) SWECCL WECCL SECCL Two million Two million
SECCL(Version 2) • 2003-2006 National Spoken English Test for second-year English majors (band 4) • 2000-2006 National Spoken English Test for 4th-year English majors-Band 8 (Task 3) • A longitudinal data (2001-2004)
Spoken (Band 8) • Testing item (Task C) • Make a comment on a given topic • Data format • Digital sounds as well as transcripts of the speeches
Spoken (Longitudinal) • 72 students 56 students • 40 hours’ speech
Tasks • Reading aloud • Retelling a story • Talking on a given topic (Narrative) • Talking on a given topic (argumentative) • Conversation (Role play) • Discussion on a given topic
4. Bilingual Corpus of Chinese English Learners (BICCEL) BICCEL Spoken Written E-C C-E E-C C-E 0.5 million 0.5 million 0.5 million 0.5 million
Spoken component of BICCEL • National Oral English test — Band 8 • The 4th year English majors • Interpreting from English to Chinese (Task A) • Interpreting from Chinese to English (Task B) • 2001-2005: 1100 testees
Written component of BICCEL • Source of data: in-class assignment • E-C and C-E translation • Across the 3rd and 4th years • 30 universities across the country
Topic Two A brief review of corpus-based studies on Chinese learner English
Sources • China National Knowledge Infrastructure (CNKI)(On-line journals) • Digital dissertation database
Conferences & workshop • The International conference on “Corpus Linguistics” 25-27 October, 2003 • The First National Symposium on corpus linguistics and ELT Education 11-13 October, 2004 • Workshop on the use of corpus in teaching and research 17-19 March, 2006
Topic Three Several corpus-based studies on English learners’ interlanguage by myself or together with my colleagues
Study One Features of oral style in English compositions of advanced Chinese EFL learners (Wen, Q.F. Ding, Y.R. & Wang, W.Y. 2003, Foreign Language Teaching & Research (4):268-274.
Study Two A Study on Frequency Adverbs Used by Advance English Learners in China Wen, Q. F. & Ding, Y. R. 2004. Modern foreign languages(2): 141-147.
Study Three An analysis of English Majors’ Abstracting abilities through their English compositions Wen, Q.F. & Liu, R.Q. 2006. Foreign Languages (2)
Study Four • A longitudinal study on the developmental features of speaking vocabulary by English majors in mainland China Wen, Q. F. 2006. Foreign Language Teaching and Research (3).
Study Five • A comparison of developmental features of Speaking and Writing vocabulary by English majors • Wen, Q. F. 2006. Foreign languages and Foreign Language Teaching (4)
Study Six Patterns of change in speaking vocabulary development by English majors
Study Two A Study on Frequency Adverbs Used by Advance English Learners in China Wen, Q. F. & Ding, Y. R. 2004. Modern foreign languages(2): 141-147.
Frequency Adverbs • Adverbs used for describing “how often” something happens • never, sometimes, usually, always
Top Twenty Frequency Adverbs • Most frequently used by native speakers according to the analyses of the British National Corpus (BNC) byLeech, Rayson and Wilson (2001)
Common features • All high-frequency words • Different frequencies in speech and writing except sometimes and twice (Leech et al. 2001)
A comparison of TTFAs in speech and writing • The overall difference • TTFAs more likely occur in writing than in speech. • The specific differences • Speech: never, always, ever, normally • Neutral: sometimes, twice • Writing: 14 words
Previous corpus-based studies • e.g. Altenberg & Granger, 2001; Cobb, 2002; Ringbom, 1998; Wen, Ting, & Wang,2003 • Conflicting finding one: overuse vs. underuse
Examples • Overuse high-frequency words in writing (Cobb, 2001) • Overuse modal verbs (Aijmer, 2002) • Underuse adverbial connectors (Altenberg & Tapper, 1998) • No study on frequency adverbs
Conflicting finding two • Tend to use written style features in their speech • Tend to use a mixed register in either speech or in writing • Tend to use oral style features in their writing • Did not compare the use of high-frequency words in speech with writing
General purposes of this study • Whether Chinese EFL learners simply overuse the TTFAs or they overuse some while underusing others • whether they use the TTFAs similarly or differently when compared their speech with writing
Research questions • Do they overuse or underuse the TTFAs differently between speech and writing? • Do they differ more from native speakers in writing or in speaking with regard to the use of the TTFAs? • Do they demonstrate a similar pattern of writing-speaking difference as native speakers in the use of the TTFAs?
Data analysis Four comparisons • Learners’ speech and native speakers’ speech SECCL vs. BNCS • Learner’s writing and native speakers’ writing CLEC vs. BNCW • Dif. in learners’ speech & native speakers’ and Dif. In learners’ writing & native speakers’ SECCL vs. BNCS and CLEC vs. BNCW • Dif. In learners’ speech & writing and dif. in native speakers’ speech & writing SECCL vs. CLEC and BNCS vs. BNCW
Results(1) TTFA use in learners’ spoken corpus (SECCL)
Results(2) TTFAs use in learners’ written corpus(CLEC)