480 likes | 620 Views
The Science and Art of Exam Development. Paul E. Jones, PhD Thomson Prometric. What is validity and how do I know if my test has it?. Validity.
E N D
The Science and Art of Exam Development Paul E. Jones, PhD Thomson Prometric
Validity “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests. Validity is, therefore, the most fundamental considerations in developing and evaluating tests.” (APA Standards, 1999, p. 9)
A test may yield valid judgments about people… • If it measures the domain it was defined to measure. • If the test items have good measurement properties. • If the test scores and the pass/fail decisions are reliable. • If alternate forms of the test are on the same scale. • If you apply defensible judgment criteria. • if you allow enough time for competent (but not necessarily speedy) candidates to take the test. • If it is presented to the candidate in a standardized fashion, without environmental distractions. • If the test taker is not cheating and the test has not deteriorated.
Is this a Valid Test? 1. 4 - 3 = _____ 6. 3 - 2 = _____ 2. 9 - 2 = _____ 7. 8 - 7 = _____ 3. 4 - 4 = _____ 8. 9 - 5 = _____ 4. 7 - 6 = _____ 9. 6 - 2 = _____ 5. 5 - 1 = _____ 10. 8 - 3 = _____
The Validity = Technical Quality of the Testing System Design Item Bank
Doc Doc Doc Doc Doc Doc Doc The Validity Argument is Part of the Testing System Design Item Bank
A Testing System Begins with Design Design Item Bank
Test Design Begins with Test Definition • Test Title • Credential Name • Test Purpose (“This test will certify that the successful candidate has important knowledge and skills necessary to…” ) • Intended Audience • Candidate Preparation • High-Level Knowledge and Skills Covered • Products or Technologies Addressed • Knowledge and Skills Assumed but Not Tested • Knowledge and Skills Related to the Test but Not Tested • Borderline Candidate Description • Testing Methods • Test Organization • Test Stakeholders • Other Information
Test Objective Test Definition Leads to Practice Analysis
Once I have a blueprint, how do I develop appropriate exam items?
The Testing System Design Item Bank
Creating Items Content Characteristics Response Modes Choose one Content Options Choose Many Text Graphics Audio Video Simulations Applications Item Single M/C Multiple M/C Single P&C Multiple P&C Drag & Drop Brief FR Essay FR Simulation/App Scoring
Desirable Measurement Properties of Items • Item-objective linkage • Appropriate difficulty • Discrimination • Interpretability
Good Item Development Practices • SME writers in a social environment • Industry-accepted item writing principles • Item banking tool • Mentoring • Rapid editing • Group technical reviews
The Testing System Design Item Bank
Classical Option Analysis: Good Item n proportion discrim Q1 Q2 Q3 Q4 Q5 >
n proportion discrim Q1 Q2 Q3 Q4 Q5 > Classical Option Analysis: Problem Item
a=0.6 b=-1.5 c=0.4 a=1.2 b=-0.5 c=0.1 a=1.0 b=1.0 c=0.25 IRT Item Analysis: Difficulty and Discrimination
The Testing System Design Item Bank
Reliability “Reliability refers to the degree to which test scores are free from errors of measurement.” (APA Standards, 1985, p. 19)
How to Enhance Reliability When Assembling Test Forms • Score reliability/generalizability • Select items with good measurement properties. • Present enough items. • Target items at candidate ability level. • Sample items consistently from across the content domain (use a clearly-defined test blueprint). • Score dependability • Same as above. • Minimize differences in test difficulty. • Pass-Fail consistency • Select enough items. • Target items at the cut score. • Maintain same score distribution shape between forms
Setting Cut Scores Why not just set the cut score at 75% correct?
Setting Cut Scores Why not just set the cut score so that 80% of the candidates pass?
The logic of criterion-based cut score setting • Certain knowledge and skills are necessary for practice. • The test measures an important subset of these knowledge and skills, and thus readiness for practice. • The passing [cut] score is such that those who pass have a high enough level of mastery of the KSJs to be ready for practice [at the level defined in the test definition], while those who fail do not. (Kane, Crooks, and Cohen, 1997)
The Main Goal in Setting Cut Scores Meeting the “Goldilocks Criteria” “We want the passing score to be neither too high nor too low, but at least approximately, just right.” Kane, Crooks, and Cohen, 1997, p. 8
Two General Approaches to Setting Cut Scores • Test-Centered Approaches:Modified Angoff • Bookmark • Examinee-Centered Approaches:Borderline • Contrasting Groups
The Testing System Design Item Bank
Security of a Testing System Design • Write more items!!! • Create authentic items. • Use isomorphs. • Use Automated Item Generation. • Use secure banking software and connectivity • Use in-person development Item Bank
Security of a Testing System Design • Establish prerequisite qualifications. • Use narrow testing windows. • Establish test/retest restrictions. • Use identity verification and biometrics. • Require test takers to sign NDAs. • Monitor test takers on site. • Intervene if cheating is detected. • Monitor individual test center performance. • Track suspicious test takers over time. Item Bank
Security of a Testing System • Perform frequent detailed psychometric review. • Restrict the use of items and test forms. • Analyze response times. • Perform DRIFT analyses. • Calibrate items efficiently. Design Item Bank
Security of a Testing System Design Item Bank • Many unique fixed forms • Linear on-the-Fly testing (LOFT) • Computerized adaptive testing (CAT) • Computerized mastery testing (CMT) • Multi-staged testing (MST)