170 likes | 365 Views
~ Test Construction and Validation ~. Fundamental Points and Practices Stephen J. Vodanovich, Ph.D. ~ Identifying The Item Domain ~ [a.k.a. Where do the questions come from?]. Test. Item Domain. Specific, defined content area (e.g., course exam, training program)
E N D
~ Test Construction and Validation ~ Fundamental Points and Practices Stephen J. Vodanovich, Ph.D.
~ Identifying The Item Domain ~ [a.k.a. Where do the questions come from?] Test Item Domain • Specific, defined content area (e.g., course exam, training program) • Expert opinion, observation (e.g., professional literature) • Job analysis (identification of major job tasks, duties)
Job Analysis Overview Task Identification KSA Identification Job (or Job Category) Task 1 Task 2 Task 3 Task 4 KSA 1 KSA 2 KSA 3 KSA 4 • Rate Tasks and KSAs • Connect KSAs to Tasks
Sample Task -- KSA Matrix To what extent is each KSA needed when performing each job task? 5 = Extremely necessary, the job task cannot be performed without the KSA 4 = Very necessary, the KSA is very helpful when performing the job task 3 = Moderately necessary, the KSA is moderately helpful when performing the job task 2 = Slightly necessary, the KSA is slightly helpful when performing the job task 1 = Not necessary, the KSA is not used when performing the job task
~ Writing Test Items ~ • Write a lot of questions • Write more questions for the most critical KSAs • Consider the reading level of the test takers
~ Selecting Test Items ~ • Initial review by Subject Matter Experts (SMEs) • Connect items to KSAs • Assess difficulty of items relative to job requirements • Suggest revisions to items and answers
Sample Item Rating Form Connect each item to a KSA or two Rate difficulty of each item (5-point scale) relative to the level of KSA needed in the job)
~ Statistical Properties of Items ~ • Item Difficulty levels. Goal is to keep items of moderate difficulty (e.g., p • values between .40 - .60) “p-value” is % of people getting each item correct -4 -3 -2 -1Mean +1 +2 +3 +4
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L L) Mean Std Dev Cases Q1 .7167 .4525 120.0 Q2 .7583 .4299 120.0 Q3 .8167 .3886 120.0 Q4 .9333 .2505 120.0 Q5 .9583 .2007 120.0 Q6 .9000 .3013 120.0 Q7 .6333 .4839 120.0 Q8 .8750 .3321 120.0 Q9 .8000 .4017 120.0 Q10 .6167 .4882 120.0 Q11 .9750 .1568 120.0 Q12 .8083 .3953 120.0 Q13 .7583 .4299 120.0 Q14 .5083 .5020 120.0 Answers are scored as correct “1” or wrong “0.” So, the mean is the p value of the items (difficulty level or % or people getting each item correct) Easy items Acceptable items
~ Statistical Properties of Items (cont.) ~ Internal Consistency • Item correlations with each other. Goal is to select items that relate • moderately to each other or “hang together” reasonably well (e.g., item x total • score correlations of between .40 - .60, “alpha if item deleted” information)
~ Item-Total Statistics ~ Scale mean if item deleted Scale variance if item deleted Corrected item-total correlation Alpha if item deleted Q1 43.3750 67.0599 .2285 .8356 Q2 43.3333 67.7031 .1513 .8370 Q3 43.2750 66.5708 .3527 .8335 Q4 43.1583 67.7814 .2700 .8354 Q5 43.1333 68.6711 .0741 .8374 Q6 43.1917 68.8117 .0111 .8385 Q7 43.4583 65.8302 .3685 .8327 Q8 43.2167 67.0283 .3346 .8341 Q9 43.2917 65.9562 .4353 .8319 Q10 43.4750 67.4952 .1526 .8373 Q11 43.1167 68.8938 .0152 .8378 Q12 43.2833 67.9022 .1381 .8371 Q13 43.3333 65.9216 .4085 .8322 Q14 43.5833 65.2871 .4214 .8315 Alpha = .8374
~ Legal Concerns ~ Kirkland v. Department of Correctional Services (1974) "Without such an analysis (job analysis) to single out the critical knowledge, skills and abilities required by the job, their importance relative importance to each other, and the level of proficiency demanded as to each attribute, a test constructor is aiming in the dark and can only hope to achieve job relatedness by blind luck” • The KSAs tested for must be critical to successful job performance • Portions of the exam should be accurately weighted to reflect the relative importance to the job of the attributes for which they test • The level of difficulty of the exam material should match the level of difficulty of the job
.89 Reliability Figures .91 .49 Hetero-Trait; MonoMethod .33 .36 .87 .55 .20 .08 .92 .20 .46 .12 .54 .93 .15 .15 .53 .62 .55 .82 .55 .20 .15 .61 .35 .41 .90 .21 .46 .13 .40 .54 .37 .49 .93 .15 .15 .53 .31 .32 .66 .54 .52 .87 Hetero-Trait; Hetero-Method Mono-Trait; Hetero-Method