150 likes | 308 Views
Challenges of Piloting Test Items. Branka Petek School of Foreign Languages Slovenia. Content. Challenges Slovenia had to face when piloting test items What we learned from experience. Why pilot test items?. To get a clear picture about candidates’ language skills.
E N D
Challenges of Piloting Test Items Branka Petek School of Foreign Languages Slovenia
Content • Challenges Slovenia had to face when piloting test items • What we learned from experience
Why pilot test items? To get a clear picture about candidates’ language skills. To get a clear picture we need good test items. Impossible to have good test items without pre- testing.
Challenges SFL had to face • Appropriate population for piloting • Administration of the items • Test format • Statistical analyses
Population for piloting • Size • Similarity to the Slovenian testing population • Level of proficiency • Test fatigue
Lessons learned • SIZE: the population should be as big as posible, (but) anything is better than nothing; • SIMILARITY: the population should be similar to the testing population; • LEVEL OF PROFICIENCY: normal (or near normal) distribution otherwise the results will be unreliable. • TEST FATIGUE: Have the canidates piloted before? Are they tired of taking the tests, piloting?
Administration • Administrators • Time • Courses • Collecting data on testakers
Lessons learned • ADMINISTRATORS: the most reliable results when we administer the tests; • TIME: depends on a course cycle; • COURSES: courses designed to prepare students for STANAG tests normally give the most reliable results; • QUESTIONNAIRES: help investigate face validity of tests, time allocated, clarity of rubrics, appropriacy of test methods, text topics (if well designed).
Test format • Length • Number of items • Task types • Topics (cultural background, influence of the course)
Lessons learned • LENGTH: Similar to the live test version; • NUMBER OF ITEMS: approximately the same number of items; • TASK TYPES: different countries use different methods – candidates might not be familiar with the task types we use; • FAMILIARITY WITH THE TOPICS: e.g. military topics (cultural background);
Statistical analyses • CTT • IRT • ‘Manual check’ • The influence of a particular population
Lessons learned • Small population, CTT – the only option; • Sometimes less than 30 - manual checking: odd answers and strange behaviour, can help eliminate some problems and improve the items; • With small population the data is less reliable - always an element of risk.
Perfect & real-world of piloting • A perfect world piloting session would mean at least 300 test takers, IRT, revising test items, repilot, IRT, final version of the test and experts to determine cut-off scores. • In real world piloting is difficult to plan and carry out. • Absolutely essential part of a testing cycle. • Piloting internationally can produce more reliable results but also represents many pitfalls we have to be aware of. • Being aware of possible problems might help us plan. • The more we invest (in the sense of time, effort and money), the more we get.
Thank you Questions, suggestions?