1 / 30

Reliability and Item Response Modelling: Myths, Observations and Applications

Overview. What is reliability?Does reliability have a role in an IRT contextJML, MML, CMLHow reliable can a test be?Is high reliability important?. Classical Reliability. Idea introduced by Spearman (1904)accidental errors attenuated relations between observed scoresReliability Coefficient (

kerryn
Download Presentation

Reliability and Item Response Modelling: Myths, Observations and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Reliability and Item Response Modelling: Myths, Observations and Applications Raymond J. Adams University of Melbourne

    2. Overview What is reliability? Does reliability have a role in an IRT context JML, MML, CML How reliable can a test be? Is high reliability important?

    3. Classical Reliability Idea introduced by Spearman (1904) …accidental errors attenuated relations between observed scores Reliability Coefficient (Spearman, 1910) The correlation between one half and the other half of several measures of the same thing

    4. Spearman’s Definition Under the classical model: Observed score is true score plus error Reliability is true variance divided by observed variance

    5. An aside Standard formulae are estimates of this under certain assumptions Kuder-Richardson formula 20: KR-20 Cronbach’s alpha Have features of what I describe in the following

    6. Properties of RX -- 1

    7. Properties of RX -- 2

    8. Properties of RX -- 3

    9. Properties of RX -- 4

    10. What about reliability and IRT? Person separation reliability (Wright & Stone, 1979) Assuming an unbiased estimator of ability

    11. Some features of Person Separation Reliability

    12. Properties of Person Separation Reliability Requires an unbiased ability estimate Warm, JML?, not EAP Has all properties of Spearman’s definition Implications Variance estimates are biased Correlations are biased Loss of precision in population parameters is hidden

    13. How reliable can a test be?

    14. Measurement Error Design Effect

    15. Increasing the Accuracy of the Estimate of Group Means

    16. Reliability and Fit

    17. Reliability and Fit

    18. Summary so far Person separation and classical reliability are analogous Reliability doesn’t describe the accuracy of individual’s measures Reliability describes biases in population parameter estimates when based upon fallible measures For many applications unreliability can be compensated for by larger samples Reliability doesn’t depend on fit Is reliability required for validity…I don’t think so

    19. Reliability and Marginal IRT Models Abilities often not estimated Population parameters are directly estimated from item responses Reliability as ratio of true to estimated variance is meaningless If abilities are estimated (EAP) they are biased The observed variance is less than the latent variance Reliability as ratio of true to estimated variance is greater than one

    20. Expected a-posteriori Predictions The EAP is the mean of the posterior distribution The variance of the posterior is used to represent uncertainty EAP can be viewed as predictions The posterior variance is uncertainty in that prediction

    21. EAP Reliability -- 1 Mislevy, Beaton, Kaplan and Sheehan (1992) argued that reliability can be viewed as the amount by which the measurement process has reduced uncertainty in the prediction of each individual’s ability

    22. EAP Reliability -- 2 rE is an individual level reliability that explains how much we have improved the prediction of this individual’s ability over assuming they were randomly sampled from the population and no item responses were observed

    23. EAP Reliability -- 3

    24. EAP Reliability -- 4 Adams (2005) shows that under the marginal model the variance in the direct estimate of the mean is:

    25. Properties of EAP Reliability Shares all of the characteristics of person separation reliability EAP-Reliability describes biases in population parameter estimates when based upon fallible measures EAP-Reliability doesn’t depend upon fit EAP-Reliability doesn’t describe the accuracy of individual’s measures – it describes how much a prediction has been improved For many applications unreliability can be compensated for by larger samples

    26. Reliability: What is it good for? Evidence of fit to the IRT model? Evidence of test validity? Information about the accuracy of individual’s estimates?

    27. Measurement Error Design Effect

    28. Examples

    29. Reliability and Design Effect: The Functional Relationship

    30. Conclusion Limited importance of reliability: Doesn’t describe accuracy of measurement of individuals Doesn’t indicate fit or validity Can be compensated for by increased samples (if analyses done correctly, another story) Perhaps most valuable as an indicatory of loss or precision due to the test design

More Related