1 / 23

Practitioner s Concerns When Conducting Military Test Validation Studies

2. Overview. An ASVAB Review Panel recently identified areas for improvement, including the military service need to conduct more frequent ASVAB validation studiesCaveat - Navy historically and currently conducts such studiesPurpose of the presentationDescribe the Navy ongoing ASVAB validation pr

loman
Download Presentation

Practitioner s Concerns When Conducting Military Test Validation Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Practitioner’s Concerns When Conducting Military Test Validation Studies Presentation for Human Factors Engineering – Technical Advisory Group Selection & Classification Sub-TAG May 7, 2008 Janet Held Navy Personnel Research, Studies, and Technology (NPRST/PERS-1) Janet.held@navy.mil

    2. 2 Overview An ASVAB Review Panel recently identified areas for improvement, including the military service need to conduct more frequent ASVAB validation studies Caveat - Navy historically and currently conducts such studies Purpose of the presentation Describe the Navy ongoing ASVAB validation program Discuss training transformation issues that could impact validation efforts Highlight validation technical issues that are currently being addressed that may concern others

    3. 3 Navy’s Ongoing ASVAB Validation Program Objectives Provide ASVAB standards for enlisted ratings that consider training and remediation expense consider student replacement expense Customers Recruiting Wants increased number of recruits school qualified Training Wants fewer students failing or requiring remediation Enlisted Community Managers (Honest Broker) Want improved fit/fill and health of their ratings for both the short and long term

    4. 4 When Does the Navy Conduct an ASVAB Validation Study New ratings are formed Consolidation of ratings into an occupational group to enhance job assignment flexibility High academic non-grad rates or setback rates Student difficulty in an advanced training pipeline Major revision in the curriculum Change in course delivery system Scheduled within cycle review

    5. 5 Navy Validation Process

    6. 6 Navy Concern in Test Validation Technical Topics Criterion quality Correction for range restriction Multiple hurdle (sequential) selection and correction procedures Composite formulation trading off validity and adverse impact Simulating job classification Multiple cutscores

    7. 7 Criterion Quality Transformation in schoolhouse training may produce an unstable criterion (school performance measurement) Mode of training delivery is changing Prior – Instructor leg group paced Transformation – computer based (CBT) self paced Transformation again for some to group paced blended solutions Concern that the validity of the ASVAB will be inestimable without a meaningful criterion Need to understand the link between job requirements and CBT CBT performance and job performance

    8. 8 Some Possible Reasons for Low Validity Criterion is compromised, unreliable, or deficient in covering the training performance dimensions Possibly because schools have insufficient funding or tools for adequate performance measurement are required to pass everyone Predictor is compromised, unreliable, or deficient in covering the relevant aptitudes, abilities, skills, and knowledge domains reflected in the training performance measures

    9. 9 Correction for Range Restriction: The Impact of Score Curtailment on the Validity Coefficient There are normal bivariate tables for every correlation issued by the Dept of Commerce and otherwise available. The properties of the bivariate normal distribution allow specification of the total distribution given only a partial segment from it. There are normal bivariate tables for every correlation issued by the Dept of Commerce and otherwise available. The properties of the bivariate normal distribution allow specification of the total distribution given only a partial segment from it.

    10. 10 Higher Validity Results in a Higher Graduation Rate – (all other things being equal)

    11. 11 Correction for Range Restriction: 2 Equalities used to Estimate Population Values

    12. 12 Linearity and Homoscedasticity Assumptions Graphically Linked

    13. 13 Multiple Hurdles: Sequential Selection Situations can Lead to Inaccurate Validity Estimates (Low) Scenario 1: ASVAB standard used for job classification and entry into initial job training followed by progression to advanced training if the student passed initial training Scenario 2: ASVAB standard used for job classification and entry into “Common Core” training followed by progression to initial job training if the student passed Common Core training Becoming the more prevalent training model as common curriculum element from various jobs are extracted, consolidated, and administered centrally (RTC) to save training and travel dollars and expedite reclassification of failures

    14. 14 Two Potential Solutions to Sequential Multiple Hurdles Use correction formulas in a sequential “back correction” to the unrestricted population Should score missing criterion data if unavailable due to academic attrition (e.g., Alf & Abrahams, 1993) Maximum likelihood procedure to estimate population parameters, also a sequential correction process Eliminates the need to score missing criterion data if unavailable due to academic attrition (e.g., Mendoza, et. al., 2004)

    15. 15 Adverse Impact: Trading it off with Validity (ASVAB) ASVAB Tests General Science (GS) Arithmetic Reasoning (AR) Mathematics Knowledge (MK) Word Knowledge (WK)* Paragraph Comprehension (PC)* Mechanical Comprehension (MC) Auto & Shop Information (AS) Electronics Information (EI) Assembling Objects (AO) Coding Speed (CS, a former ASVAB test, now a Navy Special Test)

    16. 16 Navy ASVAB Classification Composites

    17. 17 Adverse Impact/Validity Tradeoff Formula Johnson & Abrahams (2003)

    18. 18 Simulating Job Classification How does the most valid composite operate in concert with the other composites? Simulating job classification assignments across Navy ratings allows assessment of the total job classification requirements when one job’s ASVAB standard is changed allows evaluation of new tests that lower adverse impact Lewin Group, Inc. Excel/SAS application Just qualified algorithm EDS operational RIDE application (SCORE) School success and curtailment on overqualified algorithm Both algorithms show benefits for using AO and CS

    19. 19 Multiple Cutscores: Navy Nuclear Field Ratings (1997-1998) VE+AR = 113* 30% VE+AR = 103* 60% AR+MK+EI+GS = 218 39% MK+EI+GS = 156 54% MK+AS = 96 75% AR+2MK+GS = 196 76% VE = 41 99% The Nuclear Field ratings (EM, ET, MM) had the most proliferate layering of multiple cutscores. By the way, there is no documentation on when and why these or other rating multiple standards were set, that we can find.The Nuclear Field ratings (EM, ET, MM) had the most proliferate layering of multiple cutscores. By the way, there is no documentation on when and why these or other rating multiple standards were set, that we can find.

    20. 20 Over-Screening Effect of Multiple Cutscores on Recruits This graphic visually depicts the outcome of multiple cutscores on the number of available sailors that can qualify for a school.This graphic visually depicts the outcome of multiple cutscores on the number of available sailors that can qualify for a school.

    21. 21 Student Score Profile Showing Compounded Test Measurement Error Resulting from Multiple Cutscores This graphic shows the interval of test measurement error that bound an individual’s true score for each requirement. The true score is exactly in middle of the each bar. The observed score can be anywhere on that bar. For this particular recruit, all standards were met except one. The sheer number of standards leads to an increase in the probability that the person will be rejected from the school on the basis of test measurement error. This graphic shows the interval of test measurement error that bound an individual’s true score for each requirement. The true score is exactly in middle of the each bar. The observed score can be anywhere on that bar. For this particular recruit, all standards were met except one. The sheer number of standards leads to an increase in the probability that the person will be rejected from the school on the basis of test measurement error.

    22. 22 Nuclear Field Multiple Additive Cutscores: Remedy Eliminate multiple cutscores and replace with 2 alternative ASVAB composites with equally high validity that that tap into attributes that are equally relevant to school performance expand the recruit qualification rate NAPT testing depends upon ASVAB cutscores NAPT not required if 252 on either AR+MK+EI+GS or VE+AR+MK+MC (242 requires NAPT) 500 out of 2000 Nuclear Field shortfall resolved the following year (1999)

    23. 23 Some Advice for Test Validation Researchers Establish criterion integrity as you would for the predictor Correct for range restriction or you may underestimate the validity (value) of the selection instrument Assess for multiple hurdle selection situations and make the necessary corrections Establish selection composites as alternatives that lower adverse impact but maintain adequate validity Simulate job classification to determine the impact of a standard revision for one job on the availability of personnel for other jobs Evaluate multiple additive selection standards that are highly correlated - eliminate them if they are barriers Large small samples are statistically better, but small samples with a good criterion can produce good results ASVAB Monte Carlo research shows N = 200 results in accurate detection of an ASVAB composite with highest validity Air Traffic Control standards replicated with N=79

    24. 24 References Gross, A. L. (1982). Relaxing the assumptions underlying corrections for restriction in range. Educational and Psychological Measurement, 42, 795-801. Held, J. D. & Foley, P. P. (1994). Explanations for accuracy of the general multivariate formulas for correcting for range restriction. Applied Psychological Measurement, 18, 355-367. Held, J. D., Fedak, G. E., Crookenden, M. P., Blanco, T. A. (2002). Test Evaluation for Augmenting the Armed Services Vocational Aptitude Battery. Proceedings of the 44th International Military Testing Association. 281-297. Ottowa, CA. Hogan, P. & Simonson, B. (2004). Selection and classification cost effectiveness model. An Excel spreadsheet model developed for NPRST by the Lewin Group, Inc. VA. Johnson, J. W., & Abrahams, N. (2003). Exploring alternative methods of creating and weighting ASVAB composite component tests for classifying personnel into U.S. Navy jobs (Institute Report #434). Minneapolis: Personnel Decisions Research Institutes, Inc. Lawley, D. (1943). A note on Karl Pearson's selection formula. Royal Society of Edinburgh, Proceedings, Section A, 62, 28-30. Mendoza, Jorge L., Bard, David E., Mumford, Michael D. & Siew, Ang C. (2004). Criterion related validity in multiple-hurdle designs: Estimation and bias. Organizational Research Methods , 7, 418-444. Pearson, K. (1903). On the influence of natural selection on the variability and correlations of organs. Philosophical Transactions of the Royal Society, London, Series A, 200, 1-66.

More Related