NATO Language Proficiency Study: OPI Ratings Across Countries

STANAG 6001- OPI Testing Julie J. Dubeau Bucharest BILC 2008

Bill Who??? Julie J. Dubeau

Julie J. Dubeau

Are We All On the Same Page? An Exploratory Study of OPI RatingsAcross NATO CountriesUsing the NATO STANAG 6001 Scale* *This research was completed in 2006 as part of a M.A. Thesis in Applied Linguistics Julie J. Dubeau

Presentation Outline • Context • Research Questions • Literature Review • Methodology • Results • Ratings • Raters • Scale • Conclusion Julie J. Dubeau

NATO Language Testing Context • Standardized Language Profile (SLP) based on the NATO STANDARDIZATION AGREEMENT (NATO STANAG) 6001 Language Proficiency Levels (Ed 1? Ed 2?) • 26 NATO countries, 20 Partnership for Peace (PfP) countries & others… Julie J. Dubeau

InteroperabilityProblem? Language training is central within armed forces due to the increasing number of peace-support operations, and is considered as having an important role in achieving interoperability among the various players. “The single most important problem identified by almost all partners as an impediment to developing interoperability with the Alliance has been shortcomings in communications” (EAPC (PARP) D, 1997, 1, p.10). Julie J. Dubeau

Overarching Research Question • Since no known study had investigated inter-rater reliability in this context, the main research question was: How comparable or consistent are ratings across NATO raters and countries? Julie J. Dubeau

Research Questions • Research questions pertaining to the ratings RQ1 • Research questions pertaining raters’ training and background RQ2 • Research questions pertaining to the rating process and to the scale RQ3 Julie J. Dubeau

Research Questions RQ1-Ratings: • How do ratings of the same oral proficiency interviews (OPIs) compare from rater to rater? • Would the use of plus levels increase rater agreement? • How do the ratings of the OPIs compare from country to country? • Are there differences in scores within the same country? Julie J. Dubeau

Research Questions RQ2-Raters’ training and background: • Are there differences in ratings between raters who have received varying degrees of tester/rater training and STANAG training? • Did very experienced raters score more reliably than lesser experienced ones? Are experienced raters scoring as reliably as trained raters? • Are there differences in ratings between participants who test part-time versus full-time, are native or non-native speakers of English, and are from ‘Older’ and ‘Newer’ NATO countries? Julie J. Dubeau

Research Questions • RQ3-Rating process and scale use: • Do differing rating practices affect ratings? • Do raters appear to use the scale in similar ways? • What are the raters’ comments regarding the use and application of the scale? Julie J. Dubeau

Literature Review • Testing Constructs • What are we testing? • General proficiency & Why • Rating scales • Rater Variance • How do raters vary? • Rater/scale interaction • Rater training & background Julie J. Dubeau

Methodology • Design of study:Exploratory survey • 2 Oral Proficiency Interviews (OPIs A & B) • Rater data questionnaire • Questionnaire accompanying each sample OPI • Participants : Countries recruited at BILC Seminar in Sofia 2005 • 103 raters from 18 countries and 2 NATO units Julie J. Dubeau

Analysis: • Rating comparisons • Original ratings • ‘Plus’ ratings • Rater comparisons • Training • Background • Country to country comparisons • Within country dispersion • Rating process • Rating factors • Rater/scale interaction • Scale user-friendliness Julie J. Dubeau

Results RQ1- Summary • Ratings : To compare OPI ratings and to explore the efficacy of ‘plus ratings’. • Some rater-to-rater differences • ‘Plus’ levels brought ratings closer to the mean • Some country-to-country differences • Greater ‘within-country’ dispersion in some countries Julie J. Dubeau

View of OPI ratings sample A Julie J. Dubeau

Results Sample A (L1)All Ratings (with +) Julie J. Dubeau

All Countries’ Means for Sample A 19 20 18 17 16 15 Country numbers 14 13 12 11 10 9 8 7 5 6 4 3 2 1 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 Overall Country Mean Julie J. Dubeau

All Ratings for Sample B (level 2) Julie J. Dubeau

View of OPI ratings sample B Julie J. Dubeau

All Countries’ Means for Sample B Julie J. Dubeau

Results RQ2- Summary • Raters: To investigate rater training and scale training and see how (or if) they impacted the ratings, and to explore how various background characteristics impacted the ratings • Trained raters scored within the mean, especially for sample B • Experienced raters did not do as well as scale-trained raters • Full-time raters scored closer to mean • ‘New’ NATO raters scored slightly closer to mean • NNS raters scored slightly closer to mean Julie J. Dubeau

Tester (Rater) Training 70 60 50 40 Frequency 63.27% 30 20 36.73% 10 0 none to little substantial to lots Julie J. Dubeau

Years of Experience 50 40 30 Frequency 49.5% 20 19.8% 10 15.84% 14.85% 0 0 to 1 year 2 to 3 years 4 to 5 years 5 years + Julie J. Dubeau

STANAG Scale Training 60 50 40 Percent 60.0% 30 40.0% 20 10 0 none to little substantial to lots Julie J. Dubeau

‘Old’ vs. ‘New’ NATO Countries Summary of Tester Trg Total Little Lots Newer NATO member?Yes No Total 14 20 2 36 6 23 29 30 28 58 36 51 87 Julie J. Dubeau

‘Old’ vs. ‘New’ NATO Countries Rating OPI B Correct? Total Yes No Other/Missing Newer NATO member? Yes No Total 14 20 2 36 27 27 54 6 26 32 4 2 6 37 55 92 Julie J. Dubeau

Results Raters’ Background • Conducts Testing Full-time? • Yes 34 (33.0 %) • No 67 (65.0 %) • Full-time testers more reliable (accurate) • NNS (60%) raters better trained? • ‘New’ raters better trained? Julie J. Dubeau

Results RQ3- Summary • Scale: To explore the ways in which raters used the various STANAG statements and rating factors to arrive at their ratings. • Rating process did not affect ratings significantly • 3 main ‘types’ of raters emerged: • Evidence-based • Intuitive • Extra-contextual Julie J. Dubeau

Results • An ‘evidenced-based’ rating for Sample B (level 2): I compared the candidate’s performance with the STANAG criteria (levels 2 and 3) and decided that he did not meet the requirements for level 3 with regard to flexibility and the use of structural devices. Errors were frequent not only in low frequency structures, but in some high frequency areas as well. (Rater 90 – rated 2) Julie J. Dubeau

Results • An ‘intuitive’ rating for Sample A (level 1): I would say that just about every single sentence in the interpretation of the level 2 speaking could be applied to this man. And because of that I would say that he is literally at the top of level 2. He is on the verge of level 3 literally. So I would automatically up him to a low 3. (Rater 1- rated 3) Julie J. Dubeau

Results • An ‘extra-contextual’ rating for Sample A (level 1): Level 3 is the basic level needed for officers in (my country). I think the candidate could perform the tasks required of him. He could easily be bulldozed by native speakers in a meeting, but would hold his own with non-native speakers. He makes mistakes that very rarely distort meaning and are rarely disturbing.(Rater 95 – rated 2) Julie J. Dubeau

Implications • Training not equal in all countries • Scale interpretation • Plus levels useful • Different grids, speaking tests • Institutional perspectives Julie J. Dubeau

Limitations & Future Research • Participants may not have rated this way in their own countries • OPIs new to some participants • Future research could • Get participants to test • Investigate rating grids • Look at other skills Julie J. Dubeau

Conclusion of Research So, are we all on the same page? YES! BUT… • Plus levels were instrumental in bridging gap • Training was found to be key to reliability • More in-country training should be the first step toward international benchmarking. Julie J. Dubeau

Thank You! Are We All On the Same Page?An Exploratory Study of OPI RatingsAcross NATO CountriesUsing the NATO STANAG 6001 Scale Dubeau.JJ@forces.gc.ca The full thesis is available on the CDA website http://www.cda-acd.forces.gc.ca Or google Dubeau thesis

NATO Language Proficiency Study: OPI Ratings Across Countries

NATO Language Proficiency Study: OPI Ratings Across Countries

Presentation Transcript

Study group 3: NATO STANAG 6001 Ed 3 Level 4 Testing

COMMON EUROPEAN FRAMEWORK / STANAG 6001 comparisons 2 0 people 16 nations

PARTNERS IN CRIME EST – LAT co-operation on the STANAG 6001 Piret Paju EST 2009

Professionalisation and the STANAG

LEVEL 4 IAW STANAG 6001 - A CONCEPTUAL MODEL -

STANAG 6001 Conference 3-5 September 2013

Gerard Seinhorst STANAG 6001 Testing Workshop 2018 Workshop C2 Kranjska Gora , Slovenia

STANAG 3 WRITING

STANAG 6001 Testing Workshop 4-6 September 2018 Kranjska Gora, Slovenia

INTERCULTURAL PROFICIENCY GUIDELINES A Supplement to STANAG 6001

TRANSLATION GUIDELINES Under ILR / STANAG 6001 Ed. 4 James Dirgin

Skopje, 5-7 Sept. 2017 STANAG 6001 Testing Workshop

STANAG 6001 Testing Workshop Skopje, Macedonia 5 – 7 September 2017

Implications of Assessing Military Writing Skills Within the Framework of STANAG 6001

Washback of BiH STANAG 6001 test Major Dra z en Male s evi c BiH STANAG 6001 team