Reliability and Item Response Modelling: Myths, Observations and Applications

1. Reliability and Item Response Modelling: Myths, Observations and Applications Raymond J. Adams University of Melbourne

2. Overview What is reliability? Does reliability have a role in an IRT context JML, MML, CML How reliable can a test be? Is high reliability important?

3. Classical Reliability Idea introduced by Spearman (1904) �accidental errors attenuated relations between observed scores Reliability Coefficient (Spearman, 1910) The correlation between one half and the other half of several measures of the same thing

4. Spearman�s Definition Under the classical model: Observed score is true score plus error Reliability is true variance divided by observed variance

5. An aside Standard formulae are estimates of this under certain assumptions Kuder-Richardson formula 20: KR-20 Cronbach�s alpha Have features of what I describe in the following

6. Properties of RX -- 1




10. What about reliability and IRT? Person separation reliability (Wright & Stone, 1979) Assuming an unbiased estimator of ability

11. Some features of Person Separation Reliability

12. Properties of Person Separation Reliability Requires an unbiased ability estimate Warm, JML?, not EAP Has all properties of Spearman�s definition Implications Variance estimates are biased Correlations are biased Loss of precision in population parameters is hidden

13. How reliable can a test be?

14. Measurement Error Design Effect

15. Increasing the Accuracy of the Estimate of Group Means

16. Reliability and Fit

17. Reliability and Fit

18. Summary so far Person separation and classical reliability are analogous Reliability doesn�t describe the accuracy of individual�s measures Reliability describes biases in population parameter estimates when based upon fallible measures For many applications unreliability can be compensated for by larger samples Reliability doesn�t depend on fit Is reliability required for validity�I don�t think so

19. Reliability and Marginal IRT Models Abilities often not estimated Population parameters are directly estimated from item responses Reliability as ratio of true to estimated variance is meaningless If abilities are estimated (EAP) they are biased The observed variance is less than the latent variance Reliability as ratio of true to estimated variance is greater than one

20. Expected a-posteriori Predictions The EAP is the mean of the posterior distribution The variance of the posterior is used to represent uncertainty EAP can be viewed as predictions The posterior variance is uncertainty in that prediction

21. EAP Reliability -- 1 Mislevy, Beaton, Kaplan and Sheehan (1992) argued that reliability can be viewed as the amount by which the measurement process has reduced uncertainty in the prediction of each individual�s ability

22. EAP Reliability -- 2 rE is an individual level reliability that explains how much we have improved the prediction of this individual�s ability over assuming they were randomly sampled from the population and no item responses were observed

23. EAP Reliability -- 3

24. EAP Reliability -- 4 Adams (2005) shows that under the marginal model the variance in the direct estimate of the mean is:

25. Properties of EAP Reliability Shares all of the characteristics of person separation reliability EAP-Reliability describes biases in population parameter estimates when based upon fallible measures EAP-Reliability doesn�t depend upon fit EAP-Reliability doesn�t describe the accuracy of individual�s measures � it describes how much a prediction has been improved For many applications unreliability can be compensated for by larger samples

26. Reliability: What is it good for? Evidence of fit to the IRT model? Evidence of test validity? Information about the accuracy of individual�s estimates?

27. Measurement Error Design Effect

28. Examples

29. Reliability and Design Effect: The Functional Relationship

30. Conclusion Limited importance of reliability: Doesn�t describe accuracy of measurement of individuals Doesn�t indicate fit or validity Can be compensated for by increased samples (if analyses done correctly, another story) Perhaps most valuable as an indicatory of loss or precision due to the test design

Reliability and Item Response Modelling: Myths, Observations and Applications

Reliability and Item Response Modelling: Myths, Observations and Applications

Presentation Transcript

Public Participation in Disaster Response: Prospects and Myths

Missing data and dimensionality assessment for item response data: A Monte Carlo comparison of methods

Project Quality Management Quality Management Reliability

Practical Item Writer Training

Introduction to Psychometrics and Item Writing

Cognitive Modelling

Presented by Zhu Jinxin

X-Ray Gratings Mission

Applications of wavelets in PET modelling

Multidisciplinary Optimization and Reliability Analysis of Undersea Weapons

Item Response Theory

Reliability Demand Response Product February 25, 2011

Welcome to Unit IV Inventory Item Forms

UNIT IV ITEM ANALYSIS IN TEST DEVELOPMENT

Use of Envisat RA2 sea level observations in the Bluelink ocean modelling system

Estimation of Item Response Models

CMPE516

John Klaric 1,2

Reliability Models & Applications (continued)

Megacity, Air quality and Climate: Observations and multi-scale Modelling

FORWARD AND INVERSE MODELLING OF GPS OBSERVATIONS OF FENNOSCANDIAN GIA

Air Quality Modelling Applications

Reliability and Item Response Modelling: Myths, Observations and Applications