230 likes | 472 Views
2. Overview. An ASVAB Review Panel recently identified areas for improvement, including the military service need to conduct more frequent ASVAB validation studiesCaveat - Navy historically and currently conducts such studiesPurpose of the presentationDescribe the Navy ongoing ASVAB validation pr
E N D
1. Practitioner’s Concerns When Conducting Military Test Validation Studies Presentation for
Human Factors Engineering – Technical Advisory Group Selection & Classification Sub-TAG
May 7, 2008
Janet Held
Navy Personnel Research, Studies, and Technology (NPRST/PERS-1)
Janet.held@navy.mil
2. 2 Overview An ASVAB Review Panel recently identified areas for improvement, including the military service need to conduct more frequent ASVAB validation studies
Caveat - Navy historically and currently conducts such studies
Purpose of the presentation
Describe the Navy ongoing ASVAB validation program
Discuss training transformation issues that could impact validation efforts
Highlight validation technical issues that are currently being addressed that may concern others
3. 3 Navy’s Ongoing ASVAB Validation Program Objectives
Provide ASVAB standards for enlisted ratings that
consider training and remediation expense
consider student replacement expense
Customers
Recruiting
Wants increased number of recruits school qualified
Training
Wants fewer students failing or requiring remediation
Enlisted Community Managers (Honest Broker)
Want improved fit/fill and health of their ratings
for both the short and long term
4. 4 When Does the Navy Conduct an ASVAB Validation Study New ratings are formed
Consolidation of ratings into an occupational group to enhance job assignment flexibility
High academic non-grad rates or setback rates
Student difficulty in an advanced training pipeline
Major revision in the curriculum
Change in course delivery system
Scheduled within cycle review
5. 5 Navy Validation Process
6. 6 Navy Concern in Test Validation Technical Topics Criterion quality
Correction for range restriction
Multiple hurdle (sequential) selection and correction procedures
Composite formulation trading off validity and adverse impact
Simulating job classification
Multiple cutscores
7. 7 Criterion Quality Transformation in schoolhouse training may produce an unstable criterion (school performance measurement)
Mode of training delivery is changing
Prior – Instructor leg group paced
Transformation – computer based (CBT) self paced
Transformation again for some to
group paced blended solutions
Concern that the validity of the ASVAB will be inestimable without a meaningful criterion
Need to understand the link between
job requirements and CBT
CBT performance and job performance
8. 8 Some Possible Reasons for Low Validity Criterion
is compromised, unreliable, or deficient in covering the training performance dimensions
Possibly because schools
have insufficient funding or tools for adequate performance measurement
are required to pass everyone
Predictor
is compromised, unreliable, or deficient in covering the relevant aptitudes, abilities, skills, and knowledge domains reflected in the training performance measures
9. 9 Correction for Range Restriction:The Impact of Score Curtailment on the Validity Coefficient There are normal bivariate tables for every correlation issued by the Dept of Commerce and otherwise available. The properties of the bivariate normal distribution allow specification of the total distribution given only a partial segment from it. There are normal bivariate tables for every correlation issued by the Dept of Commerce and otherwise available. The properties of the bivariate normal distribution allow specification of the total distribution given only a partial segment from it.
10. 10 Higher Validity Results in a Higher Graduation Rate –(all other things being equal)
11. 11 Correction for Range Restriction: 2 Equalities used to Estimate Population Values
12. 12 Linearity and Homoscedasticity Assumptions Graphically Linked
13. 13 Multiple Hurdles: Sequential Selection Situations can Lead to Inaccurate Validity Estimates (Low) Scenario 1: ASVAB standard used for job classification and entry into initial job training
followed by progression to advanced training if the student passed initial training
Scenario 2: ASVAB standard used for job classification and entry into “Common Core” training
followed by progression to initial job training if the student passed Common Core training
Becoming the more prevalent training model as common curriculum element from various jobs are extracted, consolidated, and administered centrally (RTC) to save training and travel dollars and expedite reclassification of failures
14. 14 Two Potential Solutions to Sequential Multiple Hurdles Use correction formulas in a sequential “back correction” to the unrestricted population
Should score missing criterion data if unavailable due to academic attrition
(e.g., Alf & Abrahams, 1993)
Maximum likelihood procedure to estimate population parameters, also a sequential correction process
Eliminates the need to score missing criterion data if unavailable due to academic attrition
(e.g., Mendoza, et. al., 2004)
15. 15 Adverse Impact: Trading it off with Validity (ASVAB) ASVAB Tests
General Science (GS)
Arithmetic Reasoning (AR)
Mathematics Knowledge (MK)
Word Knowledge (WK)*
Paragraph Comprehension (PC)*
Mechanical Comprehension (MC)
Auto & Shop Information (AS)
Electronics Information (EI)
Assembling Objects (AO)
Coding Speed (CS, a former ASVAB test, now a Navy Special Test)
16. 16 Navy ASVAB Classification Composites
17. 17 Adverse Impact/Validity Tradeoff Formula
Johnson & Abrahams (2003)
18. 18 Simulating Job Classification How does the most valid composite operate in concert with the other composites?
Simulating job classification assignments across Navy ratings
allows assessment of the total job classification requirements when one job’s ASVAB standard is changed
allows evaluation of new tests that lower adverse impact
Lewin Group, Inc. Excel/SAS application
Just qualified algorithm
EDS operational RIDE application (SCORE)
School success and curtailment on overqualified algorithm
Both algorithms show benefits for using AO and CS
19. 19 Multiple Cutscores: Navy Nuclear Field Ratings (1997-1998) VE+AR = 113* 30%
VE+AR = 103* 60%
AR+MK+EI+GS = 218 39%
MK+EI+GS = 156 54%
MK+AS = 96 75%
AR+2MK+GS = 196 76%
VE = 41 99%
The Nuclear Field ratings (EM, ET, MM) had the most proliferate layering of multiple cutscores. By the way, there is no documentation on when and why these or other rating multiple standards were set, that we can find.The Nuclear Field ratings (EM, ET, MM) had the most proliferate layering of multiple cutscores. By the way, there is no documentation on when and why these or other rating multiple standards were set, that we can find.
20. 20 Over-Screening Effect ofMultiple Cutscoreson Recruits This graphic visually depicts the outcome of multiple cutscores on the number of available sailors that can qualify for a school.This graphic visually depicts the outcome of multiple cutscores on the number of available sailors that can qualify for a school.
21. 21 Student Score Profile Showing Compounded Test Measurement Error Resulting from Multiple Cutscores This graphic shows the interval of test measurement error that bound an individual’s true score for each requirement. The true score is exactly in middle of the each bar. The observed score can be anywhere on that bar. For this particular recruit, all standards were met except one. The sheer number of standards leads to an increase in the probability that the person will be rejected from the school on the basis of test measurement error. This graphic shows the interval of test measurement error that bound an individual’s true score for each requirement. The true score is exactly in middle of the each bar. The observed score can be anywhere on that bar. For this particular recruit, all standards were met except one. The sheer number of standards leads to an increase in the probability that the person will be rejected from the school on the basis of test measurement error.
22. 22 Nuclear Field Multiple Additive Cutscores: Remedy Eliminate multiple cutscores and replace with
2 alternative ASVAB composites with equally high validity that
that tap into attributes that are equally relevant to school performance
expand the recruit qualification rate
NAPT testing depends upon ASVAB cutscores
NAPT not required if
252 on either AR+MK+EI+GS or VE+AR+MK+MC
(242 requires NAPT)
500 out of 2000 Nuclear Field shortfall resolved the following year (1999)
23. 23 Some Advice for Test Validation Researchers Establish criterion integrity as you would for the predictor
Correct for range restriction or you may underestimate the validity (value) of the selection instrument
Assess for multiple hurdle selection situations and make the necessary corrections
Establish selection composites as alternatives that lower adverse impact but maintain adequate validity
Simulate job classification to determine the impact of a standard revision for one job on the availability of personnel for other jobs
Evaluate multiple additive selection standards that are highly correlated - eliminate them if they are barriers
Large small samples are statistically better, but small samples with a good criterion can produce good results
ASVAB Monte Carlo research shows N = 200 results in accurate detection of an ASVAB composite with highest validity
Air Traffic Control standards replicated with N=79
24. 24 References Gross, A. L. (1982). Relaxing the assumptions underlying corrections for restriction in range. Educational and Psychological Measurement, 42, 795-801.
Held, J. D. & Foley, P. P. (1994). Explanations for accuracy of the general multivariate formulas for correcting for range restriction. Applied Psychological Measurement, 18, 355-367.
Held, J. D., Fedak, G. E., Crookenden, M. P., Blanco, T. A. (2002). Test Evaluation for Augmenting the Armed Services Vocational Aptitude Battery. Proceedings of the 44th International Military Testing Association. 281-297. Ottowa, CA.
Hogan, P. & Simonson, B. (2004). Selection and classification cost effectiveness model. An Excel spreadsheet model developed for NPRST by the Lewin Group, Inc. VA.
Johnson, J. W., & Abrahams, N. (2003). Exploring alternative methods of creating and weighting ASVAB composite component tests for classifying personnel into U.S. Navy jobs (Institute Report #434). Minneapolis: Personnel Decisions Research Institutes, Inc.
Lawley, D. (1943). A note on Karl Pearson's selection formula. Royal Society of Edinburgh, Proceedings, Section A, 62, 28-30.
Mendoza, Jorge L., Bard, David E., Mumford, Michael D. & Siew, Ang C. (2004). Criterion related validity in multiple-hurdle designs: Estimation and bias. Organizational Research Methods , 7, 418-444.
Pearson, K. (1903). On the influence of natural selection on the variability and correlations of organs. Philosophical Transactions of the Royal Society, London, Series A, 200, 1-66.