Implications of Ignoring Examinee Motivation in Validity Studies

Implications of Ignoring Examinee Motivation in Validity Studies Mandalyn Swanson PSYCH 812 · Fall 2013

Purpose

High Stakes vs. Low Stakes • Low Stakes: • Lack of personal consequences • May not be concerned with a high score • Vary in amount of motivation • Three common situations: • Program Evaluation • Test Development • Basic Research High Stakes: Examinee needs test score to achieve desired benefit Involves personal consequences Can assume examinees will exert good effort

Levels of Motivation in Low-Stakes • Motivated • Put forth best effort • Value purpose/trained to give best effort/personality/etc. • Moderately- or Low-Motivated • Start giving best effort, then drop off • Answer only easy items • Amotivated • Extreme case • Random responders • Do not try • Levels of motivation: • Lower in LS than in HS • Test scores underestimate examinee proficiency to the degree to which they fail to give their best effort on the test • Scores will not accurately reflect true proficiency and may not be valid indicators what students know and can do

Exploring Implicationsof Ignoring Motivation • Benson’s (1998) Construct Validity Framework • Substantive Stage • Structural Stage • External Stage

Benson’s Construct Validity Framework • Substantive Stage: Operationally defining the theoretical domain of the construct • Problem: construct irrelevancy • Scores may not only measure the construct, but also motivation • If low- or amotivated examinees are present, scores will be negatively biased (look lower than true proficiency) • Validity of test scores is reduced, because they are not accurately representing proficiency

Benson’s Construct Validity Framework • Structural Stage: Involves internal domain studies – determine the extent the variables relate to one another and to the overall construct. (e.g. intercorrelations, IRT) • Because of skipping and guessing on items, parameters are impacted: • Appear harder than they actually are • b parameter: appears harder (more pronounced for easier items) • a parameter: appears more discriminating (more pronounced for items with high a’s) • Cronbach’s alpha is decreased • Mean and variance are biased Thus, conclusions drawn from scores will be misinformed!

Benson’s Construct Validity Framework • External Stage: Does the measure relate to other measures in expected ways? • Criterion validity coefficients are attenuated • Demonstrated by Wise, Wise & Bhola (2006) • Incrementally removed low-motivated examinees from a data set • Average total score on the test & correlations between the total score and SAT scores increased; • Thus, the validity of the test scores increased • Wise and DeMars (2005) and Wise and Kong (2005) also concluded that external validity of test scores was higher when low-motivated examinees were filtered out.

Dealing With Low-/Amotivated Examinees The extent to which low motivation affects test results must be addressed by test administrators • Potential strategies: • Increasing examinee effort • Identifying and/or modeling low- and amotivated examinees

Increasing Examinee Effort • Raising the Stakes (Liu, Bridgeman& Adler, 2012; Wise & DeMars, 2005) • Registration prevention • Scores appear on transcripts • Providing Incentives (Liu, et al., 2012; O’Neil, Sugrue& Baker, 1995/1996; Wise & DeMars, 2005) • Compensation for correct answers • Select instruments not mentally taxing (Schmeiser & Welch, 2006; Wise & DeMars, 2005) • Multiple choice items • Provide feedback (Liu et al., 2012; Wise & DeMars, 2005) • Give examinees test scores and explanation of meaning

Detecting and Modeling Examinees • Motivation Filtering (Lau, 2009; Wise & DeMars, 2005) • Used to remove low- or amotivated examinee responses from data set • Response-Time Effort (RTE) (Lau, 2009; Liu et al., 2012; Wise & Kong, 2005) • Length of time it took examinees to respond to each item • IRT Mixture Models (Lau, 2009; Lau & Pastor, 2010; Wise & DeMars, 2005) • Detect and weigh responses of low- or amotivated examinees **Each of these methods have their own associated validity issues

Conclusions When examinees exert low or no effort on tests, a threat to test score validity exists. • Construct-irrelevant variance = test scores underrepresent examinees’ true proficiency • Biases item parameter estimates and descriptive statistics • Indices of reliability decrease • Affects correlations of the measure to other related (or non-related) measures (convergent and divergent validity) • Scores fail to represent examinees’ true ability to the extent that examinees exert low (or no) effort on the test. • In turn, this affects the validity of the entire assessment program and the inferences drawn from the scores. “Scores from low-stakes tests may not represent what the student knows. Rather, such scores represent what students will demonstrate with minimal effort” (O’Neil, Sugrue and Baker, 1995/1996, p.135).

References American Educational Research Association, American Psychological Association, & National Council of Measurement in Education. (2000). Standards for educational and psychological testing. Washington, DC: American Psychological Association. Benson, J. (1998). Developing a strong program of construct validation: A test anxiety example. Educational Measurement, 17, 10-17. Lau, A. (2009). Using a mixture IRT model to improve parameter estimates when some examinees are amotivated. (Doctoral dissertation). ProQuest. (UMI: 3366561). Lau, A., & Pastor, D. (2010). Application of a mixture IRT model to improve parameter estimates when some examinees are amotivated. Unpublished manuscript. Liu, O. L., Bridgeman, B., & Adler, R. M. (2012). Measuring learning outcomes in higher education: Motivation matters. Educational Researcher, 41 (9), 352-362. DOI: 10.3102/0013189X12459679 O’Neil, H. F., Jr., Sugrue, B., & Baker, E. L. (1995/1996). Effects of motivational interventions on the National Assessment of Educational Progress mathematics performance. Educational Assessment, 3, 135–157.Schmeiser, C. B. & Welch, C. J. (2006). Test Development. In R. L. Brennan (Ed.), Educational Measurement. (pp. 307-353). Westport, CT: Praeger Publishers. Schmeiser, C. B. & Welch, C. J. (2006). Test Development. In R. L. Brennan (Ed.), Educational Measurement. (pp. 307-353). Westport, CT: Praeger Publishers. Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1-17. Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18, 163-183. Wise, V. L., Wise, S. L., & Bhola, D.S. (2006). The generalizability of motivation filtering in improving test score validity. Educational Assessment, 11, 65-83. Zerpa, C., Hachey, K., van Barnfield, C., & Simon, M. (2011). Modeling student motivation and students’ ability estimates from a large-scale assessment of mathematics. SAGE Open, 1-9. DOI: 10.1177/2158244011421803

Implications of Ignoring Examinee Motivation in Validity Studies