260 likes | 477 Views
Brian Lukoff Stanford University October 13, 2006. Detecting Faking on Noncognitive Assessments Using Decision Trees. Acknowledgements. Based on a draft paper that is joint work with Eric Heggestad, Patrick Kyllonen, and Richard Roberts. Overview.
E N D
Brian Lukoff Stanford University October 13, 2006 Detecting Faking on Noncognitive Assessments Using Decision Trees
Acknowledgements • Based on a draft paper that is joint work with Eric Heggestad, Patrick Kyllonen, and Richard Roberts
Overview • The decision tree method and its applications to faking • Evaluating decision tree performance • Three studies evaluating the method • Study 1: Low-stakes noncognitive assessments • Study 2: Experimental data • Study 3: Real-world selection • Implications and conclusions
Is it snowing? Yes No Yes No drive Is it raining? drive walk What are decision trees? • A technique from machine learning for predicting an outcome variable from (a possibly large number of) predictor variables • Outcome variable can be categorical (classification tree) or continuous (regression tree) • Algorithm builds the decision tree based on empirical data Training set
Is it snowing? Yes No Yes No drive Is it raining? drive walk What are decision trees? Training set • Not all cases are accounted for correctly • Wrong decision on Day 4 • Need to choose variables predictive enough of the outcome
Is it snowing? Yes No Yes No drive Is it raining? drive walk What are decision trees? • Not all cases are predicted correctly • Maybe the decision to drive or walk is determined by more than just the snow and rain? Test set
Advantages of decision trees • Ease of interpretation • Simplicity of use • Flexibility in variable selection • Functionality to build decision trees readily available in software (e.g., the R statistical package)
Application to faking: Outcome variables and training sets • Outcome variable = faking status (“faking” or “honest”) • Training set = an experimental data set where some participants instructed to fake • Training set = a data set where some respondents are known to have faked • Outcome variable = lie scale score • Training set = a data set where the target lie scale was administered to some subjects
Application to faking:Predictor variables • So far, have used individual item responses only • Other possibilities: • Variance of item responses • Number of item responses in the highest (or lowest category) • Modal item response • Decision tree method permits some sloppiness in variable selection
Evaluating decision tree performance: Metrics • Classification trees (dichotomous outcome case, e.g., predicting faking or not faking) • Accuracy rate • False positive rate • Hit rate • Continuous • Average absolute error • Correlation between actual and predicted scores
Evaluating decision tree performance: Overfitting • Algorithm can “overfit” to the training data, so performance metrics computed on the training data not indicative of future performance • Thus we will often partition the data: • Training set (data used to build tree) • Test set (data used to compute performance metrics)
Evaluating decision tree performance: Cross-validation • Training/test set split leaves a lot to the chance selection of the training and test set • Instead, partition the data into k equal subsets • Use each subset as a test set for the tree trained on the rest of the data • Average the resulting performance metrics to get better estimates of performance on new data • Here we will report cross-validation estimates
Study 1 • Data sets • Two sets of students (N = 431 and N = 824) that took a battery of noncognitive assessments as well as two lie scales as part of a larger study • Measures • Predictor variables • IPIP (“Big Five” personality measure) items • Social Judgment Scale items • Outcomes (lie scales) • Overclaiming Questionnaire • Balanced Inventory of Desirable Responding • Method • Build regression trees to predict scores on each lie scale based on students’ item responses
Study 1: Results • Varying performance, depending on the items used for prediction and the lie scale used as the outcome • Correlations between actual lie scale scores and predicted scores ranged from -.02 to .49 • Average prediction errors ranged from .74 to .95 SD
Study 1: Limitations • Low-stakes setting: how much faking was there to detect? • Nonexperimental data set: students with high scores on the lie scales may or may not have actually been faking
Study 2 • Data sets • An experimental data set of N = 590 students in two conditions (“honest” and “faking”) • Measures • Predictor variables • IPIP (“Big Five” personality assessment) items • Method • Build decision trees to classify students as honest or faking based on their personality test item responses
Study 2: Results • Decision trees correctly classified students into experimental condition with varying success • Accuracy rates of 56% to 71% • False positive rates of 25% to 41% • Hit rates of 52% to 68%
Study 2: An example • Two items on a 1-5 scale form a decision tree: • Item 19: “I always get right to work” • Item 107: “Do things at the last minute” (reversed) • Extreme values of either one are indicative of faking
Study 2: Discussion • Many successful trees utilized few item responses • Range of tree performance • Laboratory—not real-world—data • Although an experimental study, still don’t know: • If students in the faking condition really faked • If the degree to which they faked is indicative of how people fake in an operational setting • If any of the students in the honest condition faked
Study 3 • Data set • N = 264 applicants for a job • Measures • Predictor variables • Achievement striving, assertiveness, dependability, extroversion, and stress tolerance items of the revised KeyPoint Job Fit Assessment • Outcome (lie scale) • Candidness scale of the revised KeyPoint Job Fit Assessment • Method • Build decision trees predicting the candidness (lie scale) score from the other item responses
Study 3: Results • Correlations between actual and predicted candidness (lie scale) scores ranged from .26 to .58 • Average prediction errors ranged from .61 to .78 SD
Study 3: An example • Items are on a 1-5 scale, where 5 indicates the highest level of Achievement Striving • Note that most tests are for extreme item responses
Study 3: Discussion • Similar methodology to Study 1, but better results (e.g., stronger correlations) • Difference in results likely due to the fact that motivation to fake was higher in this real-world, high-stakes setting
General discussion • Wide variety in decision tree quality between groups of variables (e.g., conscientiousness scale vs. openness scale) • Examining trees can give insight into the structure of the assessment
Detecting faking in an operational setting • Some decision trees in each study used only a small number of items and achieved a moderate level of accuracy • Use decision trees for real-time faking detection on computer-administered noncognitive assessments • Real-time “warning” system • Need to study how this changes the psychometric properties of the assessment
Future work • Address whether decision trees can be effective in an operational setting—are current decision trees accurate enough to reduce faking? • Comparisons of decision tree faking/honest classification with classifications from IRT mixture models • Develop additional features to be used as predictor variables • Explore other machine learning techniques