Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory

Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant ProfessorPh.D., University of Illinois at Urbana-Champaign Undergraduate Institution: University of California, Santa Cruz

Introduction Simulation Results Discussion

Introducation • Classification consistency is the degree to which examinees would be classified into the same performance categories over parallel replications of the same assessment (Lee, 2010). • classification accuracy refers to the extent to which actual classifications using observed cut scores agree with “true” classifications based on known true cut scores.(Lee, 2010). • Distribution method: Estimating true score distribution + the observed score distribution • Individual/person method: Each individual’s classification status • true CA increases with test length • decreases when the cut score is located near the mean of the examinee distribution

classification accuracy (CA) rate • 1 total sum scores—— Lee approach Bergeson(2007). language proficiency tests CA increased with grade • 2 latent trait estimates——Rudner approach • Sireci et al., 2008 math tests • Examinees >20 good estimate

The Lee Approach • The marginal probability of the total summed score X is given by • Pr(X = x | θ):conditional summed-score distribution g(θ):density • Let x1, x2, . . . , xK−1 denote a set of observed cut scores that are used to classify examinees into k categories. Given the conditional summed-score distribution and the cut scores, the conditional category probability can be computed by summing conditional summed-score probabilities for all x values that belong to category h, Expected summed scores can be obtained from the θ cut scores as

Suppose a set of true cut scores on the summed-score metric, τ1, τ2, . . . , τK−1, determine the true categorical status of each examinee with θ or τ (i.e., expected summed score). If the true categorical status, η (=1, 2, . . . , K), of an examinee is known, the conditional probability of accurate classification is simply • the true category η can be determined by comparing the expected summed score for with the true cut scores • the marginal classification accuracy index, γ, is given by

The Lee Approach Assumes that classifications are made on the basis of the total score x. • Response pattern V’ = (V1, V2, V3, . . . , VJ ), • Vj is the response to the item j; j=1, 2, . . . , J; • J is the test length. Vj can take values m=0, 1, . . . ,M, Its goal is to find the probability of each possible total score ( ) by summing the probabilities of all possible response patterns that would lead to that total scoregiven , and then aggregates the probabilities according to the cut scores: the probability of scoring in category k is

CA • Using sample estimated ^, the conditional CA estimate under the Lee approach can be given as • the probability of an examinee’s total score and his or her estimated expected true score based on ^  falling into the same category

Rudner-Based Indices • C + 1 cut-scores estimated examinee scores standard error estimates • The expected probability of scoring in each performancelevel • category C based on these assumptions can be written as Define a N* 3C matrix of weights the index can be written as

Rudner approach • The cut scores are aligned on the latent trait scale • based on normally distributed measurement error • the probability of scoring in category k with the Rudner approach is calculated as • Conditional CA is the probability of being placed in the category that the examinee truly belongs to given u

Marginal Indices • With the D-method, or the distribution-based method, the marginal CA is found by integrating the conditional CA over the  domain use estimated quadrature points and weights and replace the integrals by summations • The P-method is person based, and simply averages the conditional indices computed for each examinee in the sample (uses the individual θ estimates)

Simulation • Dichotomous: • Items:10-80 by 10 • 1PL 2PL 3PLl • difficulty parameter N(0,1) • discrimination parameters: narrow N(0,0.3) / wider N(0,0.5) • Guessing U(0, 0.25). • Items: 10, 20, 40, and 80 • GRM- five ordered response categories • threshold 1: N(1, :5), threshold 2-4 N(1, :2) • D-methods:40 quadrature points and weights from ~N(0, • P-methods:1)N:250,500,1000 ~N(0, 1)

Two empirical accuracies were calculated: • making a classification based on ^, and the other • based on the observed total score X.

RESULTS

1PL Sample size se bias D-method> lee Shorter test

2PL • Classifications made on the basis of ^ were more accurate than classifications made on the basis of x (discrimination parameters vary more between items, the superiority of using ^  over x is more pronounced)

DISCUSSION • Results indicate that if the classification is made with x, Lee’s approach estimated the accuracy well. Lee’s approach, when coupled with the P-method, was slightly positively biased for short tests. While the D-method performed as well or better than the P-method, the D-method required an assumption of the distribution of the latent trait. • Rudner’s approach estimated the true accuracy of using ^u well. But the pattern of bias changed with the IRT model.

model fit will affect both Rudner and Lee approaches • item parameters and ability distribution are unknown in practice • Multiple cut scores • cognitive diagnostic models • multiple dimensions or multiple tests • Parameter accuracy？

the wrong model • robustness of Lee and Rudner approach • the signal detection theory • conditional false positive/negative error rate

谢谢

Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory

Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory

Presentation Transcript

Techniques for Explaining Item Response Theory to Stakeholder

Introduction to Item Response Theory

Item Response Theory in a Multi-level Framework

Item Response Theory in Health Measurement

Item Response Theory

Basics of Item Response Theory

Bias, Item Response Theory, and Mixed-Models

Two Approaches to Modelling

Application of Item Response Theory to PRO Development

DC Estimation Accuracy

Estimation of Item Response Models

Introduction to Item Response Theory

Introduction to estimation theory

Item Response Theory

Item Response Theory

Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve

Introduction to Item Response Theory (IRT)

Using Item Response Theory to Track Longitudinal Course Changes

Item Response Theory in Health Measurement