270 likes | 477 Views
Testing the Test – Serbian STANAG 6001 English Language Test. STANAG 6001 Testing Team PELT Directorate, Serbian MOD STANAG 6001 Testing Workshop Brno, Czech Republic, 6 – 8 September 2016. General and Specific Concerns.
E N D
Testing the Test – Serbian STANAG 6001 English Language Test STANAG 6001 Testing Team PELT Directorate, Serbian MOD STANAG 6001 Testing Workshop Brno, Czech Republic, 6 – 8 September 2016
General and Specific Concerns • Any kind of testing/examination has some general and some specific points of concern. • In general points, relevant to any kind of language examination, we are governed by the set of principles as presented in the Principles of Good Practice for ALTE Examinations (Association of Language Testers in Europe) • Specific points of concern arise from the following: • STANAG 6001 is a high-stake examination; • It is a language proficiency test testing general English in military setting; • It is a criterion –referenced test, based on STANAG 6001 table of level descriptors, incommensurate with other criterion-referenced tests (e.g. Cambridge ESOL exams, IELTS, etc.) and language proficiency scales (CEFR, ALTE levels, etc.)
Limiting Factors • Bearing this in mind, there are many serious constraints when designing the test (including thethings beyond your control): • What are the actual needs of the particular nation? (NATO member? PfP member? MD member? Test all levels? Test L4?) • What kind of test? (Multi-level1-2-3? Bi-level L1/2, L2/3? Single level?) • STANAG 6001 language descriptors are uniform, not open to individual/national interpretation • Number of test takers per cycle • Number of testing cycles per year • Testing facilities at your disposal: premises (small/large testing rooms?), amenities (multimedia equipment? PCs/laptops? Headphones/loudspeakers?), staff (Number of invigilators? Trained OPI-ers?), etc.
Your Responsibilities • Things you are in control of and can make individual decisions on are the following: • Test format (based on the test specifications you designed) • Number of questions, type of questions, elicitation techniques, etc. • Rating criteria (analytic/holistic? Mixed?), cut-off scores, etc. • But, even these decisions are heavily influenced by aforesaid constraints. • Whatever your test eventually come to be, it has to meet the following examination qualities: • Validity • Reliability • Impact • Practicality
Testing the Test • Test analyses are done in different modes and at different stages of test development and test administration.
Scoring Criteria for STANAG 6001 Speaking & Writing Tests • Interlocutor frame (scripted interview) in speaking test enhances standardization of the speaking test and reduces variability amongst different raters. • Analytic rating scales enhance reliability in speaking and writing tests due to more consistency in scores and also reduce “rater-candidate interaction“ and bias. • Recorded speaking responses and writing responses are cross-rated for higher degree of consistency /reliability.
Inter-Rater Reliability *calculated on 12 randomly selected independently rated speaking samples
Scoring Criteria for STANAG 6001 Reading & Listening Tests • Scoring criteria for STANAG 6001 Reading & Listening Comprehension Tests *Adapted REDS method (originally: Sustained = 70-100%, Developing = 55-65 %, Emerging = 40-50%, Random = 0-35%) • Total: 40 questions. • Maximum: 40 points / 100%. • Sustained: Level 1: 8 points out of 10 / 80% Level 2: 11 points out of 15 / 73.3% Level 3: 11 points out of 15 / 73.3%
Scoring Criteria for STANAG 6001 Reading & Listening Tests SIMPLIFIED TABLE FOR AWARDING LEVELS:
Statistical Operations in Reading Test Analysis Distribution of candidates’ scores (0 - 15) per level shows some overlapping of outliers, but the majority of scores don’t overlap. (*L1 is excluded due to smaller no. of items (10))
Statistical Operations in Reading Test Analysis Distribution of items’ facility indexes also shows some overlapping.
Statistical Operations in Reading Test Analysis Average facility value per level *(L1 = 98.8 % L2 = 77.5 % L3 = 42.8 %)
Comparison Chart*Approximations for comparison purposes, not equations
SLPs Re-Testing Results *The results of retested candidates are as expected. There is typically a 3-5 year gap between testing and retesting, during which the majority of candidates have had some language training improving their skills. However, these shifts are not too dramatic:
Pretesting • Locally at the Military Academy • Selected 50-80 senior year Military Academy cadets with 4 years of continual English language training • The upside: cheap, easy to organize, good sample, testing demographics similar enough • The downside: certain limitations due to cadets’ lack of real life and job experience • Pretesting abroad currently unavailable due to budget cuts and organizational complexity • Pretesting materials are secure because cadets are normally tested in a separate testing session and not eligible for retesting for another 3 years
Cooperation with Other Language Professionals • Cooperation with English language professionals, experts and teachers within the system of defence exists on all levels and in all forms (Military Academy Department of Foreign Languages, GS J-7 Training and Doctrine Department – Group for English language training, PELT part-time English language experts and lecturers, etc.) • English teachers act as invigilators, interlocutors and expert judges when determining content and face validity, cut-off scores, feedback, etc.
Cooperation with Other Functional Units in HR Sector • Reporting the test results • Interpreting STANAG 6001 language proficiency levels to language non-professionals • Consulting with Personnel departments in MoD and GS, the Centre for Peacekeeping Operations, the Military Academy, The National Defence School, etc. about language-related career matters, language requirements for appointments, attending courses abroad, participation in PK missions, etc.
Thank you for your attention Questions?