150 likes | 159 Views
Software for IRT and Plausible Value Imputations Matthias von Davier. Software for IRT models . This is a vast and growing field! Special purpose software (free or $$$) Faster, well tested, used in operational analyses General purpose software, IRT (R) packages
E N D
Software for IRT and Plausible Value ImputationsMatthias von Davier
Software for IRT models • This is a vast and growing field! • Special purpose software (free or $$$) • Faster, well tested, used in operational analyses • General purpose software, IRT (R) packages • Slow(ish), some are well tested, some just developed, then abandoned • Tweaking existing packages to estimate IRT and IRT model extensions, or DIY programs • e.g. WinBUGS, JAGS, STAN, or Python scripts • This topic could fill one day or more
My IRT Software Development Path (since 1990): • Lacord, Polyra (1990-1993, Fortran 77) • Winmira(1994, GFA Basic), • Winmira2001 (1997-2004, Delphi) • mgroup/ygroup/mcmcgroup/saemgroup(2000-2010, Fortran) • mdltm(since 2005, Ansi C) • RPCM (2017, Python 2.7) • Extended IsingModels (2018, Python 3.6)
Special Purpose IRT Software • Rasch (quite incomplete list) • Polyra (Rost), WINMIRA (1994, 2001, von Davier) • Winsteps, Bigsteps, Facets (Wright, Linacre) • RUMM (Andrich), OPLM (Verhelst), Multira (Carstensen), • Quest, Conquest (Masters, Wu, Adams, …) • 2PL and MIRT (quite incomplete list) • Logist (Wingersky et al.), LPCM (Fischer) • Parscale, Bilog, (Bock, Mislevy, Muraki, etc.) • Multilog (Thissen), FlexMirt (Cai), IRTPro, • MIRT (Glas), MIRT (Haberman), • mdltm (von Davier), new Conquest (Adams, Wu…)
mdltm: Used in PISA and PIAAC • IRT: Rasch, polytomousRasch, 2PL, GPCM • Mixture IRT • Latent Class Models, located latent class models • Cognitive Diagnostic Models (CDMs) • Multidimensional IRT Models • Multilevel and Mixture MIRT and CDMs • Multiple-population models • Global model fit, item fit, person fit • EAP, MLE, WLE for person ability estimates • …
mdltm: Used in PISA and PIAAC • von Davier, M. (2016), High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. ETS Research Report Series, 2016: 1–11. doi:10.1002/ets2.12120 • von Davier M. (2017) New Results on an Improved Parallel EM Algorithm for Estimating Generalized Latent Variable Models. In: van der Ark L., Wiberg M., Culpepper S., Douglas J., Wang WC. (eds) Quantitative Psychology. IMPS 2016. Springer Proceedings in Mathematics & Statistics, vol 196. Springer • von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, Vol. 61, No. 2. (November), pp. 287-307. https://doi.org/10.1348/000711007X193957 • von Davier, M, Yamamoto, K., Shin, H.-J., Chen, H., Khorramdel, L., Weeks, J., Davis, S. Kong, N. Kandathil, M. (2019) Evaluating item response theory linking and model fit for data from PISA 2000–2012, Assessment in Education: Principles, Policy & Practice, DOI: 10.1080/0969594X.2019.1586642 • Shin, H.J., Khorramdel, L. & von Davier, M. (2019). GDM Software MDLTM Including Parallel EM Algorithm. Chapter 30 in: von Davier, M. & Lee, Y-S. (eds.): Handbook Diagnostic Classification Models. Springer: New York.
Some R packages for IRT • eRm (Meir, Hatzinger) • mRm : mixture Rasch (Preinerstorfer), • TAM (Wu, Ping, Robitzsch, et al.), • MLIRT (Fox), LNIRT (Fox), • MIRT (Chalmers), • LME4 (Bates, Maechler et al.), • …
Stata support for IRT • Stata IRT module • https://www.stata.com/manuals/irt.pdf • Gllamm (Skrondal, Rabe-Hesketh) • http://www.gllamm.org/faqs/models/irtfitb.html • RaschTest (Hardouin) • https://www.stata-journal.com/article.html?article=st0119 • StataStan for IRT • https://arxiv.org/pdf/1601.03443.pdf
Software for Extended IRT • Lme4 • Explanatory IRT models (DeBoeck & Wilson etc) • PyStan / RStan • Several papers and STAN manual give IRT examples • Uses Hamiltonian Monte Carlo & NUTS sampler (Gelman et al.) • Can be used to estimate IRT as well as extended IRT & speed, e.g.: • Engagement, speed & ability model (Ulitzsch, von Davier & Pohl ‘19) • Needs to be ‘programmed’ but standard IRT STAN scripts exist • https://mc-stan.org/docs/2_19/stan-users-guide/item-response-models-section.html • Slow(ish): fully Bayesian approach / exploring posteriors • Very flexible, new models can be developed “easily” • Later implementation using ML framework seem promising
Generating Plausible Values • Most IRT software produce point estimates • JMLE (only good for very long tests) • MML and then EAP, WLE, or MLE • CML (Rasch or OPLM only) and then EAP… • Plausible values are a different animal • Not ideal(!) for anything, but good for many things, unless ‘outside’ variables are used • PVs are imputations from the posterior distribution of proficiency, given responses and covariates
Generating Plausible Values • Occasionally, the above also allow generation of PVs, but typically not at the same level of complexity of background information. • Some functionality exists in: • TAM • Dexter • Mplus • MiceAdds (addition to MICE) • Several R packages can use PVs, however…
Generating Plausible Values • Software selection is much smaller, if criterion is having been used operationally or including large numbers of background variables: • MGROUP (Mislevy & Sheehan, 1992) • CGROUP (Thomas, 1993) • YGROUP (von Davier, 2004) • MCEMGROUP (von Davier & Sinharay, 2007) • SAEMGROUP (von Davier & Sinharay, 2010) • Conquest (Adams, Wu, …)
IRT and Generating PVs • Summary: • Complex IRT calibrations need very thorough QC * just like all statistical modeling / estimation * • PVs are the products of a complex imputation model. No single model is ‘right’ for all purposes • Point estimates (and posterior variance or measurement error) can be generated by most IRT software packages • Best to build a custom model containing all needed variables – challenging for practitioners