Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model

Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model • Mayuko Simon, David Chayer, Pam Hermann, and Yi Du • Data Recognition Corporation • April, 2012

How should banked item parameters be checked? • The idea for this study came about when the authors were faced with a large existing bank of CAT items with estimated item parameters that needed augmentation.

Re-calibration of banked item parameters and item parameter drift • Recalibration is recommended at periodic interval • CAT item data is sparse matrix and range of students’ ability for each item are limited

What would be a reasonable way to recalibrate items? • The methods can be applied to • Maintenance of CAT item bank • Detecting item parameter drift • Calibration of field test items

How did other researchers calibrate/re-calibrate CAT data? • Missing imputation to avoid sparseness (Harmes, Parshall, and Kromrey, 2003) • Calibrate FT items by anchoring operational items (Wang and Wiley, 2004) • Calibrate FT item anchoring ability (Kingsbury, 2009) • Use ability to calibrate item parameter to detect drift (Stocking, 1988)

Simulation study • 300 items in item bank • 20,000 students’ simulated responses, N(0,1) • Known item parameter drift (10% of item bank) • Various drift sizes

Design

Four calibration methods in this study • Anchor person ability (AP) • Anchor person ability and anchor 200 items difficulty out of 300 items (API) • Use of Displacement value from Winsteps output • Item by Item calibration (IBI)

IBI: Item by Item calibration • A vector of responses for an item • A vector of ability who took the item • Same concept as logistic regression, but use Winsteps to calibrate • No sparseness involved • Less data is needed (especially when not all items in a bank needed to be checked)

Evaluation • One sample t-test with alpha 0.01 for AP, API, and IBI • Cutoff value 0.4 for Displacement method • Type I error rate • Type II error rate • Sensitivity (Type II + Sensitivity = 1) • RMSE (average difference from banked value for flagged items) • BIAS (average bias from banked value for flagged items)

Type I error rate * Average over 40 replications • Type I error for Control is also inflated • Condition 1 had higher Type I error rate

Type II error rate * Average over 40 replications • Type II error for Displacement method is too high. • Condition 1 had higher Type II error rate

Sensitivity * Average over 40 replications • Sensitivity for Displacement method is too low. • Condition 1 had lower sensitivityrate

Items with small sample sizes and small drift are difficult to flag correctly.

Type II error were with items with small sample size and/or small drift Items with large drift Items with small N Item with small drift

Same item Same items Same items

Which method has re-calibrated item difficulty closer to the banked value? • Median of the RMSE are similar across three methods • IBI has less variance of RMSE than AP

Which method has less bias with the re-calibrated item difficulty? • All three methods has very small bias • IBI has less variance of BIAS than AP

Conclusion • Use caution with Displacement value to identify item parameter drift. • AP, API, and IBI worked reasonably well. • Items with small drift or small sample sizes are difficult to detect the item parameter drift • Compared to AP, IBI had less variance of RMSE and BIAS • Item parameter in one direction (condition 1) would cause more bias in the final ability estimate, leading to higher Type I and Type II errors.

Limitation and Future Study • Proportion of items with item parameter drift was 10% of the bank. • How the results would change with various proportion? How about the size of drift? • Used only Rasch model • How about other models and software? • Minimum sample size was 10 • How about different minimum sample sizes (e.g., 30,50, etc)? • No iterative procedure (no update of the item difficulty with drift) • Does results get better if we do iteratively, updating the difficulty after detecting?

Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model

Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model

Presentation Transcript

Overview of the Rasch Measurement Model

Model Parameter Stability

Precise Parameter measurement

Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model

Patient reported outcome measures and the Rasch model

Item Response Theory in Health Measurement

Item Analysis Using The Rasch Model

Puma Drift cat

A model for detecting illusory contours

EPE/EDP 711: Intro to Rasch Measurement

Summary of Bayesian Estimation in the Rasch Model

Detecting Curved Symmetric Parts using a Deformable Disc Model

FIT ANALYSIS IN RASCH MODEL

Detecting Accent Sandhi in Japanese Using a Superpositional F0 Model

Ocean Ecosystem Model Parameter Estimation in a Bayesian Hierarchical Model (BHM)

Implications and Extensions of Rasch Measurement

Detecting Test Security Problems Using Item Response Times and Patterns

Detecting Parameter R edundancy in Integrated Population Models

Rasch trees: A new method for detecting differential item functioning in the Rasch model

Planning a measurement program

Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application

Item Response Theory in Health Measurement