Software for IRT and Plausible Value Imputations Matthias von Davier

Software for IRT and Plausible Value ImputationsMatthias von Davier

Software for IRT models • This is a vast and growing field! • Special purpose software (free or $$$) • Faster, well tested, used in operational analyses • General purpose software, IRT (R) packages • Slow(ish), some are well tested, some just developed, then abandoned • Tweaking existing packages to estimate IRT and IRT model extensions, or DIY programs • e.g. WinBUGS, JAGS, STAN, or Python scripts • This topic could fill one day or more

My IRT Software Development Path (since 1990): • Lacord, Polyra (1990-1993, Fortran 77) • Winmira(1994, GFA Basic), • Winmira2001 (1997-2004, Delphi) • mgroup/ygroup/mcmcgroup/saemgroup(2000-2010, Fortran) • mdltm(since 2005, Ansi C) • RPCM (2017, Python 2.7) • Extended IsingModels (2018, Python 3.6)

My IRT Software Development Path (since 1990):

Special Purpose IRT Software • Rasch (quite incomplete list) • Polyra (Rost), WINMIRA (1994, 2001, von Davier) • Winsteps, Bigsteps, Facets (Wright, Linacre) • RUMM (Andrich), OPLM (Verhelst), Multira (Carstensen), • Quest, Conquest (Masters, Wu, Adams, …) • 2PL and MIRT (quite incomplete list) • Logist (Wingersky et al.), LPCM (Fischer) • Parscale, Bilog, (Bock, Mislevy, Muraki, etc.) • Multilog (Thissen), FlexMirt (Cai), IRTPro, • MIRT (Glas), MIRT (Haberman), • mdltm (von Davier), new Conquest (Adams, Wu…)

mdltm: Used in PISA and PIAAC

mdltm: Used in PISA and PIAAC • IRT: Rasch, polytomousRasch, 2PL, GPCM • Mixture IRT • Latent Class Models, located latent class models • Cognitive Diagnostic Models (CDMs) • Multidimensional IRT Models • Multilevel and Mixture MIRT and CDMs • Multiple-population models • Global model fit, item fit, person fit • EAP, MLE, WLE for person ability estimates • …

mdltm: Used in PISA and PIAAC • von Davier, M. (2016), High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. ETS Research Report Series, 2016: 1–11. doi:10.1002/ets2.12120 • von Davier M. (2017) New Results on an Improved Parallel EM Algorithm for Estimating Generalized Latent Variable Models. In: van der Ark L., Wiberg M., Culpepper S., Douglas J., Wang WC. (eds) Quantitative Psychology. IMPS 2016. Springer Proceedings in Mathematics & Statistics, vol 196. Springer • von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, Vol. 61, No. 2. (November), pp. 287-307. https://doi.org/10.1348/000711007X193957 • von Davier, M, Yamamoto, K., Shin, H.-J., Chen, H., Khorramdel, L., Weeks, J., Davis, S. Kong, N. Kandathil, M. (2019) Evaluating item response theory linking and model fit for data from PISA 2000–2012, Assessment in Education: Principles, Policy & Practice, DOI: 10.1080/0969594X.2019.1586642 • Shin, H.J., Khorramdel, L. & von Davier, M. (2019). GDM Software MDLTM Including Parallel EM Algorithm. Chapter 30 in: von Davier, M. & Lee, Y-S. (eds.): Handbook Diagnostic Classification Models. Springer: New York.

Some R packages for IRT • eRm (Meir, Hatzinger) • mRm : mixture Rasch (Preinerstorfer), • TAM (Wu, Ping, Robitzsch, et al.), • MLIRT (Fox), LNIRT (Fox), • MIRT (Chalmers), • LME4 (Bates, Maechler et al.), • …

Stata support for IRT • Stata IRT module • https://www.stata.com/manuals/irt.pdf • Gllamm (Skrondal, Rabe-Hesketh) • http://www.gllamm.org/faqs/models/irtfitb.html • RaschTest (Hardouin) • https://www.stata-journal.com/article.html?article=st0119 • StataStan for IRT • https://arxiv.org/pdf/1601.03443.pdf

Software for Extended IRT • Lme4 • Explanatory IRT models (DeBoeck & Wilson etc) • PyStan / RStan • Several papers and STAN manual give IRT examples • Uses Hamiltonian Monte Carlo & NUTS sampler (Gelman et al.) • Can be used to estimate IRT as well as extended IRT & speed, e.g.: • Engagement, speed & ability model (Ulitzsch, von Davier & Pohl ‘19) • Needs to be ‘programmed’ but standard IRT STAN scripts exist • https://mc-stan.org/docs/2_19/stan-users-guide/item-response-models-section.html • Slow(ish): fully Bayesian approach / exploring posteriors • Very flexible, new models can be developed “easily” • Later implementation using ML framework seem promising

Generating Plausible Values • Most IRT software produce point estimates • JMLE (only good for very long tests) • MML and then EAP, WLE, or MLE • CML (Rasch or OPLM only) and then EAP… • Plausible values are a different animal • Not ideal(!) for anything, but good for many things, unless ‘outside’ variables are used • PVs are imputations from the posterior distribution of proficiency, given responses and covariates

Generating Plausible Values • Occasionally, the above also allow generation of PVs, but typically not at the same level of complexity of background information. • Some functionality exists in: • TAM • Dexter • Mplus • MiceAdds (addition to MICE) • Several R packages can use PVs, however…

Generating Plausible Values • Software selection is much smaller, if criterion is having been used operationally or including large numbers of background variables: • MGROUP (Mislevy & Sheehan, 1992) • CGROUP (Thomas, 1993) • YGROUP (von Davier, 2004) • MCEMGROUP (von Davier & Sinharay, 2007) • SAEMGROUP (von Davier & Sinharay, 2010) • Conquest (Adams, Wu, …)

IRT and Generating PVs • Summary: • Complex IRT calibrations need very thorough QC * just like all statistical modeling / estimation * • PVs are the products of a complex imputation model. No single model is ‘right’ for all purposes • Point estimates (and posterior variance or measurement error) can be generated by most IRT software packages • Best to build a custom model containing all needed variables – challenging for practitioners

Software for IRT and Plausible Value Imputations Matthias von Davier

Software for IRT and Plausible Value Imputations Matthias von Davier

Presentation Transcript

IRT

Multiple Imputations: Introduction and Application in Stata

Democracy: Plausible Paradox?

Qualitätssicherung von Software

IRT Interview

Plausible and Implausible Scenarios

Software Value Management

Qualitätssicherung von Software

SOFTWARE VALUE MANAGEMENT

IRT UPDATES

Classification on Missing Data for Multiple Imputations

Plausible motion simulation

Plausible motion: conclusion