1 / 15

Software for IRT and Plausible Value Imputations Matthias von Davier

Software for IRT and Plausible Value Imputations Matthias von Davier. Software for IRT models . This is a vast and growing field! Special purpose software (free or $$$) Faster, well tested, used in operational analyses General purpose software, IRT (R) packages

jforrest
Download Presentation

Software for IRT and Plausible Value Imputations Matthias von Davier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software for IRT and Plausible Value ImputationsMatthias von Davier

  2. Software for IRT models • This is a vast and growing field! • Special purpose software (free or $$$) • Faster, well tested, used in operational analyses • General purpose software, IRT (R) packages • Slow(ish), some are well tested, some just developed, then abandoned • Tweaking existing packages to estimate IRT and IRT model extensions, or DIY programs • e.g. WinBUGS, JAGS, STAN, or Python scripts • This topic could fill one day or more

  3. My IRT Software Development Path (since 1990): • Lacord, Polyra (1990-1993, Fortran 77) • Winmira(1994, GFA Basic), • Winmira2001 (1997-2004, Delphi) • mgroup/ygroup/mcmcgroup/saemgroup(2000-2010, Fortran) • mdltm(since 2005, Ansi C) • RPCM (2017, Python 2.7) • Extended IsingModels (2018, Python 3.6)

  4. My IRT Software Development Path (since 1990):

  5. Special Purpose IRT Software • Rasch (quite incomplete list) • Polyra (Rost), WINMIRA (1994, 2001, von Davier) • Winsteps, Bigsteps, Facets (Wright, Linacre) • RUMM (Andrich), OPLM (Verhelst), Multira (Carstensen), • Quest, Conquest (Masters, Wu, Adams, …) • 2PL and MIRT (quite incomplete list) • Logist (Wingersky et al.), LPCM (Fischer) • Parscale, Bilog, (Bock, Mislevy, Muraki, etc.) • Multilog (Thissen), FlexMirt (Cai), IRTPro, • MIRT (Glas), MIRT (Haberman), • mdltm (von Davier), new Conquest (Adams, Wu…)

  6. mdltm: Used in PISA and PIAAC

  7. mdltm: Used in PISA and PIAAC • IRT: Rasch, polytomousRasch, 2PL, GPCM • Mixture IRT • Latent Class Models, located latent class models • Cognitive Diagnostic Models (CDMs) • Multidimensional IRT Models • Multilevel and Mixture MIRT and CDMs • Multiple-population models • Global model fit, item fit, person fit • EAP, MLE, WLE for person ability estimates • …

  8. mdltm: Used in PISA and PIAAC • von Davier, M. (2016), High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. ETS Research Report Series, 2016: 1–11. doi:10.1002/ets2.12120 • von Davier M. (2017) New Results on an Improved Parallel EM Algorithm for Estimating Generalized Latent Variable Models. In: van der Ark L., Wiberg M., Culpepper S., Douglas J., Wang WC. (eds) Quantitative Psychology. IMPS 2016. Springer Proceedings in Mathematics & Statistics, vol 196. Springer • von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, Vol. 61, No. 2. (November), pp. 287-307. https://doi.org/10.1348/000711007X193957 • von Davier, M, Yamamoto, K., Shin, H.-J., Chen, H., Khorramdel, L., Weeks, J., Davis, S. Kong, N.  Kandathil, M. (2019) Evaluating item response theory linking and model fit for data from PISA 2000–2012, Assessment in Education: Principles, Policy & Practice, DOI: 10.1080/0969594X.2019.1586642 • Shin, H.J., Khorramdel, L. & von Davier, M. (2019). GDM Software MDLTM Including Parallel EM Algorithm. Chapter 30 in: von Davier, M. & Lee, Y-S. (eds.): Handbook Diagnostic Classification Models. Springer: New York.

  9. Some R packages for IRT • eRm (Meir, Hatzinger) • mRm : mixture Rasch (Preinerstorfer), • TAM (Wu, Ping, Robitzsch, et al.), • MLIRT (Fox), LNIRT (Fox), • MIRT (Chalmers), • LME4 (Bates, Maechler et al.), • …

  10. Stata support for IRT • Stata IRT module • https://www.stata.com/manuals/irt.pdf • Gllamm (Skrondal, Rabe-Hesketh) • http://www.gllamm.org/faqs/models/irtfitb.html • RaschTest (Hardouin) • https://www.stata-journal.com/article.html?article=st0119 • StataStan for IRT • https://arxiv.org/pdf/1601.03443.pdf

  11. Software for Extended IRT • Lme4 • Explanatory IRT models (DeBoeck & Wilson etc) • PyStan / RStan • Several papers and STAN manual give IRT examples • Uses Hamiltonian Monte Carlo & NUTS sampler (Gelman et al.) • Can be used to estimate IRT as well as extended IRT & speed, e.g.: • Engagement, speed & ability model (Ulitzsch, von Davier & Pohl ‘19) • Needs to be ‘programmed’ but standard IRT STAN scripts exist • https://mc-stan.org/docs/2_19/stan-users-guide/item-response-models-section.html • Slow(ish): fully Bayesian approach / exploring posteriors • Very flexible, new models can be developed “easily” • Later implementation using ML framework seem promising

  12. Generating Plausible Values • Most IRT software produce point estimates • JMLE (only good for very long tests) • MML and then EAP, WLE, or MLE • CML (Rasch or OPLM only) and then EAP… • Plausible values are a different animal • Not ideal(!) for anything, but good for many things, unless ‘outside’ variables are used • PVs are imputations from the posterior distribution of proficiency, given responses and covariates

  13. Generating Plausible Values • Occasionally, the above also allow generation of PVs, but typically not at the same level of complexity of background information. • Some functionality exists in: • TAM • Dexter • Mplus • MiceAdds (addition to MICE) • Several R packages can use PVs, however…

  14. Generating Plausible Values • Software selection is much smaller, if criterion is having been used operationally or including large numbers of background variables: • MGROUP (Mislevy & Sheehan, 1992) • CGROUP (Thomas, 1993) • YGROUP (von Davier, 2004) • MCEMGROUP (von Davier & Sinharay, 2007) • SAEMGROUP (von Davier & Sinharay, 2010) • Conquest (Adams, Wu, …)

  15. IRT and Generating PVs • Summary: • Complex IRT calibrations need very thorough QC * just like all statistical modeling / estimation * • PVs are the products of a complex imputation model. No single model is ‘right’ for all purposes • Point estimates (and posterior variance or measurement error) can be generated by most IRT software packages • Best to build a custom model containing all needed variables – challenging for practitioners

More Related