240 likes | 258 Views
EALTA MILSIG: Standardising the assessment of writing across nations. Ülle Türk Language Testing Unit Estonian Defence Forces. STANAG 6001 testing conference 7-9 July 2009 Zagreb, Croatia. Outline. Background Aims of the project Procedure Standard setting Results Conclusions.
E N D
EALTA MILSIG:Standardising the assessment of writing across nations Ülle Türk Language Testing Unit Estonian Defence Forces STANAG 6001 testing conference 7-9 July 2009 Zagreb, Croatia
Outline Background Aims of the project Procedure Standard setting Results Conclusions
Background: EALTA • EALTA = European Association for Language Testing and Assessment • Established in 2004 as a professional association for language testers in Europe. • Mission: to promote the understanding of theoretical principles of language testing and assessment, and the improvement and sharing of testing and assessment practices throughout Europe. • Annual conferences • Discussion lists • ealta-members@lists.lancs.ac.uk • specialist lists
Background: MILSIG • March 2008 – MILSIG mailing list established: ealta-mil@lists.lancs.ac.uk • EALTA conference in 2008: • a meeting of language testers working in the military • participating countries/ institutions: Denmark, Estonia, Latvia, Lithuania, SHAPE, Slovenia, Sweden • agreement to co-operate in standardising writing assessment
Aims of the project • To select a number of sample scripts that • have been written in response to a variety of prompts • demonstrate English language proficiency at STANAG levels 1-3 (4) • could later be used as • benchmark performances in assessing writing and in rater training • sample performances for teachers and test takers • To study the possibility of carrying out standardisation via email.
Procedure and timeline • Each participating country/institution selects 4 scripts, including problem scripts, at levels 1-3 – end of May • Scripts are collected, coded and sent to all participants – middle of June • Scripts are marked following the procedures established in each country – end of September • STANAG level descriptors used • Weak, standard and strong performances at each level identified • Comments provided • Results analysed; decisions taken
Participants • Denmark (1) • Estonia (5) • Latvia (4) • Lithuania (3) • SHAPE (2) • Slovenia (5)
Council of Europe: A manual Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR) Pilot version: September 2003 Final version: January 2009 ‘Relating an examination or test to the CEFR can best be seen as a process of “building an argument” based on a theoretical rationale.’ (p 9) Familiarisation Specification Standardisation training/benchmarking Standard setting Validation Standard setting procedures
Table 5.2: Time Management for Assessing Written Performance Samples
Familiarisation: Raters rating descriptors Mean correlation: 0.89 (SD =.04) Range: 0.83 (R14) to 0.98 (R05)
27 scripts: 12 letters: 4 (+ 5) essays: 1 report: 1 memorandum: -------------------- A first draft of a lecture (2): Paper for a newsletter (1): Paper/letter/essay (1): 6 L1, 14 L2, 7 L3 3 L1, 8 L2, 1 L3 2 L1, 4 L2, 3 L3 L3 L2 1 L2, 1 L3 L1 L3 Task types and original ratings
Rating scripts • Task: • Use STANAG 6001 writing descriptors, NOT your own rating scale. • If the script was written for a STANAG 6001 test in your country/ institution, which level would it be awarded? • Do you consider it a weak, standard or strong performance at the awarded level? • Why?
Analysis of ratings • Coding: • L1 weak = 1 • L1 standard = 2 • L1 strong = 3 • L2 weak = 4 • L2 standard = 5 • L2 strong = 6 • L3 weak = 7 • L3 standard = 8 • L3 strong = 9
Scripts recoded • MILSIGPR_01–MILSIGPR_12a = MSP-01–MSP-12 • MILSIGPR_12b = MSP-13 • MILSIGPR_12c = MSP-14 • MILSIGPR_12d = MSP-15 • MILSIGPR_12e = MSP-16 • MILSIGPR_12f = MSP-17 • MILSIGPR_12g = MSP-18 • MILSIGPR_12h = MSP-19 • MILSIGPR_13 = MSP-20 • MILSIGPR_14 = MSP-21 • etc
Script ratings • Mean rating: 2.8–7.8 (St dev: 0.00-1.47) • 1-3 (L1): 1 script (6 scripts) • 4-6 (L2): 24 scripts (12 scripts) • 7-9 (L3): 2 scripts (7 scripts) • 15 scripts (55.6%) – agreement on the level, though usually not on whether it is weak, standard or strong performance at that level
Three examples • MILSIGPR_07 (MSP-07) • A lot of grammatical mistakes, spelling, very basic range. Not enough for Level 2. • MILSIGPR_13 (MSP-20) • task at level 3, but the writing is not coherent, very incorrect, sometimes difficult to understand the meaning and very uninteresting – getting even worse towards the end • MILSIGPR_14 (MSP-21) • well written with control of grammar, good vocabulary and abstract concepts and arguments clearly conveyed, the person might be able to write at a high level 3, but does not quite prove it here
Mean ratings for scripts Mean rating: 5.2 (SD = 1.44)
Correlations between country ratings N = 27; N = 23 All significantat 0.01 level
Conclusions • Such a project is indeed needed!
Way forward • 1 L1 script, 12 L2 scripts, 2 L3 scripts • Analysis of scripts good benchmarks? • Collecting more scripts, particularly at L3 • Scripts based on a variety of task types • Did we start at the wrong end? • Looking at scripts that caused disagreement • Can we reach agreement? • What features make them problematic? • Expanding the circle to include more countries
References • EALTA website: http://www.ealta.eu.org • Council of Europe. 2009.Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR): http://www.coe.int/t/dg4/linguistic/Manuel1_EN.asp