1 / 27

Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?

Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?. Bayesian Statistics, Fall 2009 Chunyan Liu & James Gambrell. Introduction. 3 Parameter IRT Model Assigns each item a logistic function with a variable lower asymptote. Purpose.

phong
Download Presentation

Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG? Bayesian Statistics, Fall 2009 Chunyan Liu & James Gambrell

  2. Introduction • 3 Parameter IRT Model • Assigns each item a logistic function with a variable lower asymptote.

  3. Purpose • Compare BILOG-MG and WinBUGS estimation of item parameters under the 3 parameter logistic (3PL) IRT model • Investigate the effect of sample size on the estimation of item parameters

  4. BILOG – MG (Mislevy & Bock 1985) • Propriety software • Uses unknown estimation shortcuts • Sometimes gives poor results • “Black Box” program • Very fast estimation • Provides only point estimates and standard errors for model parameters • Estimation method • Marginal Maximum Likelihood • Expectation-Maximization algorithm (Bock and Aitkin, 1981)

  5. WinBUGS • More open-source (related to OpenBugs) • More widely studied • Might give more robust results • Much more flexible • Provides full posterior densities for model parameters • More output to evaluate convergence • Very slow estimation!

  6. Literature Review • Most researchers have used custom-built MCMC samplers using Metropolis-Hastings- within-Gibbs algorithm • as recommended by Cowles, 1996! • Patz and Junker (1999a & b) • Wrote MCMC sampler in S plus • Found that their sampler produced estimates identical to BILOG for the 2PL model, but had some trouble with 3PL models. • Found MCMC was superior at handling missing data.

  7. Literature Review • Jones and Nediak (2000) • Developed “commercial grade” sampler in C++ • Improved the Patz and Junker algoritm • Compared MCMC results to BILOG using both real and simulated data • Found that item parameters varied substantially, but the ICCs described were close according to the Hellinger deviance criterion • MCMC and BILOG were similar for real data • MCMC was superior for simulated data • Note that MCMC provides much more diagnostic out to assess convergence problems

  8. Literature Review • Proctor, Teo, Hou, and Hsieh (2005 project for this class!) • Compared BILOG to WinBUGS • Fit a 2PL model • Only simulated a single replication • Did not use deviance or RMSE to assess error

  9. Data • Test: 36-item multiple choice • Item parameters (a, b and c) come from Chapter 6 of Equating, Scaling and Linking (Kolen and Brennan) • Treated as true item parameters (See Appendix) • Item responses simulated using 3PL model a – slope b – difficulty c – guessing – examinee ability

  10. Methods • N(N=200, 500, 1000, 2000) θ values were generated from N(0,1) distribution. • N item responses were simulated based on the θ’s generated in step 1 and the true item parameters using the 3PL model. • Item parameters (a, b, c for the 36 items) were estimated using BILOG-MG based on the N item responses. • Item parameters (a, b, c for the 36 items) were estimated using WinBUGS based on the N item responses using the same prior as specified by BILOG-MG. • Repeat steps two and four 100 times. For each item, we have 100 estimated parameter sets from both programs

  11. Priors • a[i] ~ dlnorm(0, 4) • b[i] ~ dnorm(0, 0.25) • c[i] ~ dbeta(5,17) • Same priors used in BILOG and WinBUGS

  12. Criterion-Root Mean Square Error (RMSE) • For each item, we computed the RMSE for a, b, and c using the same formula where and Here could be , , or and x could be the parameter of a, b or c

  13. Results 1. Deciding the number of Burn-in Iterations- History Plots

  14. Results-cont. 1. Deciding the number of Burn-in Iterations- Autocorrelation and BGR plots

  15. Results-cont. 1. Deciding the number of Burn-in Iterations- Statistics node mean sd MC error 2.5% median 97.5% start sample a[1] 0.899 0.1011 0.004938 0.7117 0.8949 1.107 2501 3500 a[2] 1.339 0.1159 0.004132 1.125 1.333 1.58 2501 3500 a[3] 0.7308 0.111 0.005769 0.551 0.717 0.9893 2501 3500 a[4] 2.012 0.2712 0.009897 1.531 1.996 2.59 2501 3500 a[5] 1.766 0.2202 0.009585 1.394 1.745 2.243 2501 3500 b[1] -1.706 0.2944 0.01793 -2.253 -1.717 -1.1 2501 3500 b[2] -0.4277 0.1167 0.005916 -0.6571 -0.428 -0.1857 2501 3500 b[3] -0.7499 0.3967 0.01586 -1.409 -0.7994 0.1348 2501 3500 b[4] 0.4324 0.09295 0.004443 0.2363 0.4384 0.6008 2501 3500 b[5] -0.05619 0.122 0.006737 -0.3127 -0.05246 0.1657 2501 3500 c[1] 0.2458 0.088 0.004718 0.09253 0.2415 0.4362 2501 3500 c[2] 0.1403 0.04745 0.002158 0.05368 0.139 0.2361 2501 3500 c[3] 0.2538 0.09285 0.005864 0.09991 0.243 0.4557 2501 3500 c[4] 0.2669 0.035 0.001491 0.1911 0.2693 0.3282 2501 3500 c[5] 0.2588 0.05029 0.002589 0.1526 0.261 0.35 2501 3500

  16. Results-cont. 1. Running conditions for WinBUGS • Adaptive phase: 1000 iterations • Burn-in: 1500 iterations • For computing the Statistics: 3500 iterations • Using 1 chain • Using bugs( ) function to run WinBUGS through R • Need BRugs and R2WinBUGS packages

  17. Results-cont. 2. Effect of Sample Size

  18. Results-cont. BILOG-MG vs. WinBUGS – a parameter

  19. Results-cont. BILOG-MG vs. WinBUGS - b parameter

  20. Results-cont. BILOG-MG vs. WinBUGS - c parameter

  21. Discussion & Conclusions • Larger sample size decreased RMSE for all parameters under both programs. • For N=200, there was a significant convergence problem for BILOG-MG. No problem with WinBUGS.

  22. Discussion & Conclusions-cont. • Slope parameter “a” • WinBUGS was superior to BILOG when N = 500 or less • More accurately estimated for items without extreme a or b parameters by both programs. • Difficulty parameter “b” • BILOG was superior to WinBUGs when N = 500 or less • Both programs had larger error for items either too difficult or too easy • Guessing parameter “c” • WinBUGs was superior to BILOG at all sample sizes, but especially at N = 1,000 or less • More accurately estimated for difficult items by both programs. • Both programs had larger error for items with shallow slopes.

  23. Limitations • Only one chain is used in the simulation study. • Some of the MC errors are not less than 1/20 of the standard deviation, could use more iterations in MCMC sampler • Simulated data • Conforms to the 3PL model much more closely than real data would • No missing responses • No omit problems • Fewer low scores

  24. WinBUGS code for running 3PL model 3PL; { for (i in 1:N) { for (j in 1:n) { e[i,j]<-exp(a[j]*(theta[i]-b[j])) p[i,j] <- c[j]+(1-c[j])*(e[i,j]/(1+e[i,j])) resp[i,j] ~ dbern(p[i,j]) } theta[i] ~ dnorm(0,1) } for (i in 1:n) { a[i] ~ dlnorm(0, 4) b[i] ~ dnorm(0, 0.25) c[i] ~ dbeta(5,17) } }

  25. True Item Parmaeters

  26. Acknowledgement • Professor Katie Cowles

  27. Questions?

More Related