A comparison of K-fold and leave-one-out cross-validation of empirical keys

A comparison of K-fold and leave-one-out cross-validation of empirical keys Alan D. Mead, IIT mead@iit.edu

What is “Keying”? • Many selection tests do not have demonstrably correct answers • Biodata, SJT, some simulations, etc. • Keying is the constructing of a valid key • What the “best” people answered is probably “correct” • Most approaches use a correlation, or something similar

Correlation approach • Create 1-0 indicator variables for each response • Correlate indicators with a criterion (e.g., job performance) • If r > .01, key = 1 • If r < -.01, key = -1 • Else, key = 0 • Little loss by using 1,0,-1 key

How valid is my key? • Now that I have a key, I want to compute a validity… • But I based my key on the responses of my “best” test-takers • Can/should I compute a validity in this sample? • No! Cureton (1967) showed that very high validities will result even for invalid keys • What shall I do?

Validation Approaches • Charge ahead! • “Sure, .60 is an over-estimate; there will be shrinkage. But even half would still be substantial” • Split my sample into “calibration” and “cross-validation” samples • Fine if you have a large N… • Resample

LOOCV procedure • Leave one out cross validation (LOOCV) resembles Tukey’s jackknife resampling procedure • Hold out one person 1 • Compute a key on remaining N-1 • Score the held-out person • Repeat with person 2, 3, 4, … • Produces N scores that do not capitalize on chance • Correlate the N scores with the criterion • (But use the total sample key for scoring)

Mead & Drasgow, 2003 • Simulated test responses & criterion • Three approaches • Charge ahead • LOOCV • True cross-validation • Varying sample sizes: • N=50,100,200,500,1000

LOOCV Results

LOOCV Conclusions • LOOCV was much better than simply “charging ahead” • But consistently slightly worse than actual cross-validation • LOOCV has a large standard error • An elbow appeared at N=200

K-fold keying • LOOCV is like using crossvalidation samples of N=1 • Break sample into K groups • E.g., N=200 and k=10 • Compute key 10 times • Each calibration sample N=190 • Each crossvalidation sample N=10 • Does not capitalize on chance • Potentially much more stable results

Present study • Simulation study • Four levels of sample size • N=50, 100, 200, 500 • Several levels of K • K=2, 5, 10, 25, 50, 100, 200, 500 • K=2 is double cross validation • True validity = 0.40 • 35 item test with four responses

Main Effect of Sample Size Note: Mean (Standard Error)

Effect of k, N=50

Summary • N=50 is really too small a sample for empirical keying • Using a k that produces hold out samples of 4-5 seemed best • N=100, k= 20 • N=200, k= 50 • N=500, k= 100 • Traditional double cross validation was almost as good for N>100

A comparison of K-fold and leave-one-out cross-validation of empirical keys

A comparison of K-fold and leave-one-out cross-validation of empirical keys

Presentation Transcript

Bootstrap and Cross-Validation

Mobility and the changing structure of occupations: a cross-cohort comparison

Empirical Evaluation and Comparison of Enterprise Models: A Framework and Its Application

Structure Comparison, Analysis and Validation

Comparison of empirical and neural network hot-rolling process models

Comparison of Parental Leave Policies between Japan and Australia

An Empirical Comparison of Microscopic and Mesoscopic Traffic Simulation Paradigms

Objectives: Cross -Validation ML and Bayesian Model Comparison Combining Classifiers

Empirical Comparison of Algorithms for Network Community Detection

Chapter 11 k- Fold Cross Validation

An Experimental Comparison of Empirical and Model-based Optimization

Measurement of cross section   K + K –

Comparison of K+ and mechano-sensitive ion channels

Cross Cultural Comparison

Surface Comparison and Validation Metric

Empirical Validation of UML Statechart Diagram Metrics: A Family of Three Experiments

MAKE A SHORT PHRASE OUT OF ONE ROOT AND ONE OF ITS DEFINITIONS

CROSS VALIDATION OF SATELLITE RADIATION TRANSFER MODELS

Cross Cultural Comparison