360 likes | 474 Views
CELEA/CAAL 2007 Beijing The Assessment of English Oral Proficiency: Alternative Measures Lynne Hansen, C. Ray Graham, Jeremi Brewer, Rebecca Brewer, Wariyaporn Tieocharoen Contact e-mail: hansenl@byuh.edu. Toward the automatic adminstration and scoring of speaking tests.
E N D
CELEA/CAAL 2007 Beijing The Assessment of English Oral Proficiency: Alternative Measures Lynne Hansen, C. Ray Graham, Jeremi Brewer, Rebecca Brewer, Wariyaporn Tieocharoen Contact e-mail: hansenl@byuh.edu
Toward the automatic adminstration and scoring of speaking tests • 1.Computerized elicited imitation (EI) • 2. Automated measurement of temporality FAST (Fully Automated Speaking Test) • Purpose: to develop a valid and reliable automated instrument to measure oral proficiency in L2 English
Psycholinguistic/ Empirical Language processing Temporality Sentence orientation EI, FAST, Versant (automated tests) Communicative/ Functional____ Authenticity Negotiation of meaning Turn-taking OPIs, SOPIs Comparing Theoretical Constructs for Speaking Tests
This paper is dedicated to the memory of Craig Chaudron. He created L2 elicitation instruments for Vietnamese and Indonesian. He pointed out the need for work on an English EI test C. Chaudron et al. (2005, July). Elicited imitation as an oral proficiency measure. Paper presented at the AILA Conference Craig Chaudron
A B C What is Elicited Imitation (EI)?
History of Elicited Imitation • 1963: Child L1 development • (e.g. Fraser, Bellugi & Brown, 1963 ) • 1964: Diagnose language abnormalities • (e.g. Menyuk, 1964) • 1970’s: L2 acquisition • (e.g. Naiman, 1974)
Two Major Thrusts in L2 Elicited Imitation Research • Psycholinguistic research into language competence and SLA processes (Erlam, 2006) • Indirect measurement of oral language proficiency (Bley-Vroman & Chaudron, 1994; Chaudron et al., 2005)
Bley-Vroman & Chaudron (1994) • “We regard it as premature to view elicited imitation as a proven method for inferring learner competence…” But… • “the more you know of a foreign language, the better you can imitate the sentences of the language. Thus EI is a reasonable measure of global proficiency.” (p. 247)
Pilot Study Instruments • Three Forms of 60 sentences (13 repeated on all three forms,47 unique to each form) • Sentence length 3 to 24 syllables • Wide variety of morphological and syntactic forms • Variety of lexical items (81.3%=K1, 6.7%=K2, .23%=AWL, 11.6%=Off) • Sentences selected according to criteria (Chaudron et al, 2005) • Recorded in studio; male and female voices
Subjects of Pilot Study • 223 learners of English in an IEP in U.S. • 13 L1 backgrounds (Chinese, Spanish, Korean, Japanese, Mongolian, etc.) • English proficiency levels from Novice to Advanced • Ages 18 to 53, mean = 24.5, SD = 6.9
Form A Reliability58 items 78 persons • Person RAW SCORE-TO-MEASURE CORRELATION = .98 • CRONBACH ALPHA (KR-20) Person RAW SCORE RELIABILITY = .97 • 58 Measured Items ITEM RELIABILITY = .98
Persons (N=78) -MAP- Items (N=60) <high ability>|<high item difficulty> 110 + 06 | 100 + | 07 90 + 78 | | 60 80 + |T | 40 42 70 + 39 72 75 T| 08 38 41 | 43 45 73 74 71 |S 76 | 09 37 41 | 10 60 40 42 43 + 11 23 77 S| 36 59 22 59 | 60 67 69 68 | 03 12 35 38 39 57 58 61 62 | 13 57 25 56 63 70 | 34 56 58 50 21 24 34 55 66 +M 04 14 33 44 33 35 36 52 54 64 65 M| 15 48 46 12 26 32 37 49 | 16 32 55 02 13 45 48 50 | 31 47 54 51 53 | 17 30 50 07 08 16 19 27 30 47 | 19 18 40 06 10 11 14 17 18 46 S+ 25 28 53 01 04 31 44 | 24 29 09 28 29 | 01 23 26 27 52 03 05 15 |S 20 | 22 49 30 T+ 21 | 20 |T 05 20 + | 51 10 + 02 <low ability>|<low difficulty> Figure 1. Form A Person/Item Map
Form A Items with Unacceptable Point Measure Correlations • 1016 Have you slept ? • 3015 Maybe she likes cats. • 3025 She quickly jumped down. • 4018 They play games. • 4008 The situation in Iraq calls for diplomacy and sensitivity.
Instrument • The 60 best discriminating items from the pilot study • Sentence length 5 to 22 syllables • Recorded in studio by male and female voices
Subjects • 156 learners in a university ESL program in the U.S. • 12 L1 backgrounds (Chinese, Mongolian, Portuguese Spanish, Korean, Japanese, et al.) • English proficiency levels from Novice to Advanced • Ages 18 to 55, mean = 24.3, SD = 6.8
Test Administration 1. Orientation. Logged on to computer. 2. Responses recorded as they spoke. 3. Logged off. Wavefiles saved to server.
Scoring Method 1 • Similar to Chaudron et al. (2005) • Divide sentences into syllables • Mark each syllable with 1 or 0 • Transcribe each mistake below the correct syllable • Scoring 0-4 -1 for each error
Scoring the Imitations 1 1 1 1 1 1 1 1 1 1. If she lis tens, she will un der stand. 4 1 0 1 1 1 1 1 2. Why had they liked peas so much? 3 1 1 0 1 1 1 1 3. Big ships will al ways make noise. 3 (are) 0 1 0 1 0 1 1 0 1 4. We should have ea ten break fast by now. 0 (They) (eat) (right) 1 1 1 0 0 1 1 1 1 0 1 0 0 0 1 1 1 5., If her heart were to stop beat ing we might not be a ble to help her! 0 (will)(be) (will) (being)
Scoring Method 2 • Correct Syllable count • 1 point for each syllables repeated accurately • 0 points for incorrect syllables
Additional speaking tests administered to the subjects • 15 min. face-to-face placement interview • 30 min. simulated oral proficiency test (SOPI) scored by human raters • 30 min. computer elicited oral achievement test (LAT) scored by human raters • OPI administered by certified ACTFL testers (stratified random sample)
Reliability57 items 154 persons • 57 Measured Items ITEM RELIABILITY = .98 • Person RAW SCORE-TO-MEASURE CORRELATION = .96 • CRONBACH ALPHA (KR-20) Person RAW SCORE RELIABILITY = .96
Summary and Conclusions • We have presented large numbers of EI items to almost 400 ESL students • Student responses to EI are very consistent • Overall comparisons between EI scores and scores on other measures of oral language proficiency are promising • The EI task involves mechanisms similar to those used in spontaneous speech
Where do we go from here? • We need to continue experimenting with the interrelationships between student responses and EI variables such as sentence length, sentence complexity, and vocabulary. • We need to examine responder variables such as working memory, L1, age, etc. • We need to use new analysis tools to examine factors which contribute to learner responses.
Where do we go from here? (contd.) • We need to experiment with new ways of scoring and weighting items. • We need to develop speech technology tools to do the automatic scoring. • We need to develop an automated adaptive speaking test which includes EI, similar to those used currently in reading and listening
FAST (Fully Automated Speaking Test) • FAST was originally conceived as a test of oral fluency • Fluency: The temporal aspect of oral proficiencyCucchiarini, Strik & Boves (2000)
L2: Lennon, 1990 Riggenbach,1991 Kuwahara,1995 Chaimanee 1999 Language attrition: Russell, 1996 Kenny, 1996 Nakuma, 1997 Yukawa, 1997, 1998 Hansen et al., 1998, 2002 Tomiyama, 1999 Nagasawa, 1999 Hesitation Phenomena in Speech L1:Goldman-Eisler,1968
Variables measured automatically for calculation in the FAST algorithm Total length of silence Average length of silence Number of runs of speech Total length of speech Average length of speech run
Fluency studies of missionary language FASTManual measurement of temporality
Talk and Silence in the English Narratives of Fluent and Nonfluent Speakers
Relationships of ESL level to temporality in L1 and English narratives: ANOVA English Mother tongue F sig. F sig. SP time 6.91 .001 .082 .921 SP length 4.84 .009 .157 .855 Talk time 6.72 .002 .107 .898 Run length 3.60 .030 2.38 .071