1 / 40

Fundamental Testing Assumptions Revisited: Examination Length and Number of Options

Fundamental Testing Assumptions Revisited: Examination Length and Number of Options. Karine Georges & Kelly Piasentin Assessment Strategies Inc. Overview.

arden-young
Download Presentation

Fundamental Testing Assumptions Revisited: Examination Length and Number of Options

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fundamental Testing Assumptions Revisited: Examination Length and Number of Options Karine Georges & Kelly Piasentin Assessment Strategies Inc.

  2. Overview • Credentialing organizations seek to balance many factors such as program validity and credibility with more tangible aspects such as costs and ease of development. Two such aspects are investigated: • Method to reduce the total number of test questions while retaining validity and reliability. • The effects of reducing the typical number of options from four (4) to three (3).

  3. Part I Examination Length: A Case Study Karine Georges, MSc.

  4. Case Study: Certification Program • Tasked in 2007 to determine whether 180-item, 4-hour examinations could be shortened in light of a potential move to CBT.

  5. Validity and Examination Length • Content Validity: The number of items on an examination must be sufficient to ensure adequate representative coverage. • Face Validity: If shortened, perceptions of stakeholders need to be considered vis-a-vis comparable professions.

  6. Examination Length and Reliability • What is an acceptable reliability index for credentialing? • “ A reliability correlation coefficient should fall in the high .80s or above for longer examinations (e.g., 150 or more items)”. [NOCA, 2004]. • What is the range of reliability indices for the current 180-item certification examinations? • Average : .84 • Min: .78 • Max: .92

  7. Examination Length and Practical Considerations If reliability is related to item length why shorten the examination? Costs and efficiency • Each item costs between $300-$1000 to develop (Vale, 2006). • Need additional items for safeguard purposes, or ancillary materials such as prep guides or readiness tests. • Client’s intention to go to CBT makes it an advantage to have shorter examinations so seat time can be reduced and more candidates accommodated within the testing period.

  8. Research Approaches • Two approaches: • Classical Test Theory (CTT) approach  Examining reliability coefficient using Spearman-Brown formula. • Item Response Theory (IRT) approach  Examining the item information function using empirical data.

  9. CTT Results for the Two Certification Programs Spearman Brown Formulation: Pxx= Npxx 1+ (N-1) pxx • Results show that examinations can be lowered by 20-30 questions (or about 10%) and still remain above .80.

  10. Limitations of CTT Results • General Limitations of Spearman Brown: • Assumption that examinations are exactly parallel • Only one value for a range of abilities • Largely impacted by cohort

  11. IRT Approach: Item Information Curve • Research has shown that in higher stakes examinations with Pass/Fail decisions such as certification examinations, examinations can be shortened without impacting classification abilities (Schulz & Wang, 2001) • What would be the impact if the certification examinations had 10% fewer items? • How about 25% or 50%?

  12. IRT - Item Information Curve • IRT models specify the probability of a discrete outcome such as a correct response to an item, in terms of person and item parameters. • Person parameter: ability of a candidate (theta) • Item parameters: a: Discrimination (slope) b: Difficulty (location) c: Guessing

  13. IRT - Test Information Curve • All Item Information Curves add to a Test Information Curve • Amount of information scale differs based on length of examination and quality of the items • Pass/Fail decision must be made where error is minimal (ideally where the passmark is located) and where level of ability can be clearly differentiated

  14. IRT Results for Program A

  15. IRT Results for Program B

  16. IRT - Results and Implications • The examinations can be reduced by at least 10% without significantly impacting the pass/fail decision. • Other factors to take into consideration • Number of candidates • Robustness of item bank

  17. Other Considerations • What about face validity? • How would an examination with 90 items be viewed by other professionals compared to a comparable examination of 180 items?

  18. Other Certification Programs • Review of over 75 certification programs within the same profession. • The average number of items: 164 or between 150-175 items (including experimental items)   • Minimum: 100 • Maximum: 250

  19. Summary • Data suggest that the number of items can be reduced by 10% with minimal impact on the validity and reliability.

  20. Part II How Many Options is Optimal in Multiple Choice Testing? Kelly Piasentin, PhD

  21. Multiple Choice Testing • Most common format used in Licensure and Certification examinations • Consists of a stem (i.e., the question being asked) and a series of options to choose from (usually 4) Example: • In which state is the 2008 CLEAR conference being held? • Arkansas • Alaska • Arizona • Alabama Stem Options

  22. Advantages of Multiple Choice • Versatility • Efficiency • Scoring accuracy and economy • Reliability • Diagnosis • Control of difficulty • Amenable to item analysis

  23. Disadvantages of Multiple Choice • Time consuming to write • Difficult to create effective distracters (i.e., options that are plausible, but incorrect)

  24. Time Spent Writing MCQs • Sample of 75 Item Writers for 3 different licensing/certification examinations • Average time spent writing an MCQ: 52 minutes • Percentage of time spent writing:

  25. Effort Spent Writing Distracters Of the 75 Item Writers… • 25% reported that it was difficult to write the 1st distracter • 40% reported that it was difficult to write the 2nd distracter • 75% reported that it was difficult to write the 3rd distracter

  26. How many options should an MCQ have? • 4-option MCQs are widely used in standardized testing everywhere • But, are 4 options ideal? • Some IW guidelines say, “develop as many options as feasible” (Haladyna & Downing, 1989) • More recently, “develop as many functional distractors as are feasible” (Haladyna, Downing, & Rodriguez, 2002) • Increasing emphasis on the quality of distractors as opposed to the quantity

  27. Definition of a Functional Distracter “A functional distracter is one that has (a) a significant negative point-biserial correlation with the total test score, (b) a negative sloping item characteristic curve, and (c) a frequency of response greater than 5% for the total group.” Haladyna & Downing (1988)

  28. How does # options impact guessing? • With 4 options, candidates have a 25% chance of getting any one question correct by simply guessing • Probability is reduced to 20% if there are 5 options • Probability is increased to 33% if there are 3 options • BUT…. if a typical examination has 25 items, each with 3-options, chance of getting at least a 70% on the examination by pure blind guessing is 1 in 25,000 • So, do you get more bang for your buck by having more options?

  29. Are 4-option MCQs optimal? Factors to consider: • Time and cost it takes to develop distracters • Time it takes for candidates to complete the examination • Psychometric properties of examination • Item difficulty • Item discrimination • Test reliability (Coefficient alpha)

  30. Arguments in favour of 3-options: • Less time is needed to develop two plausible distracters • More 3-option items can be administered without increasing testing time • Inclusion of additional high quality items per unit of time should improve test score reliability • Having fewer options decreases the likelihood of exposing additional aspects of the domain to candidates (e.g., context clues to other questions)

  31. Data from a Licensing/Certification Examination • Number of MCQs: 235 • Number of candidates: 5,393 • Mean item difficulty: .721 • Mean discrimination index: .166 • Test reliability: .88 • Most chosen distracter: .167 • 2nd most chosen distracter: .077 • Least chosen distracter: .035

  32. Reducing Examination Items to 3 Options What would be the effect on item difficulty, discrimination and reliability of reducing the items on the examination to 3 options if the least chosen distracter was: • Attributed to correct answer? • Attributed to 2nd least chosen distracter? • Randomly distributed to each of the other 3 choices?

  33. Reducing Examination Items to 3 Options If least chosen attributed to correct answer: • Item difficulty: .752 • Mean discrimination index: .136 • Coefficient Alpha: .834

  34. Reducing Examination Items to 3 Options If least chosen attributed to 2nd least chosen distracter: • Item difficulty: .720 • Mean discrimination index: .168 • Reliability: .881

  35. Reducing Examination Items to 3 Options If least chosen distributed randomly to each of the other 3 choices: • Item difficulty .731 • Mean discrimination index: .158 • Reliability : .868

  36. Summary

  37. 4 Options vs. 3 Options • Moving from 4 options to 3 options did not have a significant impact on average item difficulty, discrimination or test reliability.

  38. Summary • Two primary benefits of using 3 options (as opposed to 4 options) • Faster item writing • Better testing • Better quality items • Cost savings • Shorter test time • More questions in same amount of time (potential for increased reliability)

  39. Conclusion • These two presentations demonstrate that you can accrue some efficiencies from reducing test length and number of response options without compromising test validity. • Further research needed to confirm findings.

  40. Contact Information Assessment Strategies 1400 Blair Place, Suite 210 Ottawa, ON K1J 9B8 Canada. Telephone: 613-237-0241 E-mail: www.asinc.ca • Karine Georges, MSc kgeorges@asinc.ca • Kelly Piasentin, PhD kpiasentin@asinc.ca

More Related