1.32k likes | 1.51k Views
Bridging the gap from LogR to IRT. Indebted to: Wu, A. D., & Zumbo, B.D. (2007). Thinking About Item Response Theory from a Logistic Regression Perspective: A Focus on Polytomous Models.
E N D
Bridging the gap from LogR to IRT Indebted to: Wu, A. D., & Zumbo, B.D. (2007). Thinking About Item Response Theory from a Logistic Regression Perspective: A Focus on Polytomous Models. In Shlomo S. Sawilowsky (Ed.), Real Data Analysis (pp. 241-269). Information Age Publishing, Inc.., Greenwich, CT..
Bridging the gap from LogR to IRT • The explanatory variable • In IRT, the exposure is a cts latent variable • Hence IRT = generalized linear latent model • The outcome variable(s) • Logistic regression typically models ONE outcome, whereas IRT models a number of categorical outcomes simultaneously
Aim of IRT • To relate a subjects’ responses to a number of test items, to an underlying ability (AKA trait) by way of a mathematical function • Due to the non-linear relationship, a logistic curve is often used, and is referred to as the Item Characteristic Curve or Item Response Function
Increasing prob of correct response Increasing level of latent trait ICC / IRF
Options for form of ICC Examples • Step function (Guttman) • 2 parameter normal ogive (Lord) • 2 parameter logistic (Birnbaum) • 1 parameter logistic (Rasch) • Nonparametric, monotone increasing (Mokken)
Back to the logistic form • Two parameter binary logistic IRT model • θ: ability level • αi: the slope (AKA discrimination) for item i • βi: the threshold (AKA difficulty) for item I • (θ – βi): discrepancy between item & ability of respondent
For a single item: Let X = (θ – βi) & add intercept c
Recall: Slope Value of covariate (ability) at point of inflection
So • For a uni-dimensional IRT model (a single trait θ) the 2PL IRT model is a simple LogR model • In the binary IRT setting we simultaneously model a number of items • The parameters for each item may/may-not vary across items
Conditional Independence Item 1 Item 1 Item 2 Item 2 Item 3 Item 3 Trait Item 4 Item 4 Item 5 Item 5 Item 6 Item 6 Before After
The Rasch model A worked example across multiple packages
Abortion data Analysis of Multivariate Social Science Data, Second Edition David J. Bartholomew Fiona Steele Irini Moustaki Jane Galbraith Dataset actually comes from the first edition so hope it’s still in the second!!!
Idea • Same Rasch model 4 ways • R (LTM) • Mplus • Raschtest • GLLAMM (via long format data-prepping)
Table 7.1 – attitude towards abortion Abortion should be permitted if: 1] The woman decides on her own that she does not wish to have the child 2] The couple agree that they do not wish to have the child 3] The woman is not married and does not wish to marry the man 4] The couple cannot afford any more children
Basic output SUMMARY OF CATEGORICAL DATA PROPORTIONS WOMAN Category 1 0.562 Category 2 0.438 COUPLE Category 1 0.406 Category 2 0.594 NOT_MARR Category 1 0.364 Category 2 0.636 AFFORD Category 1 0.383 Category 2 0.617
rasch(data = abortion[, c(2, 3, 4, 5)], IRT.param = FALSE) > summary(rasch1) Model Summary: log.Lik AIC BIC -657.7894 1325.579 1345.078 Coefficients: value std.err z.vals woman -0.7843 0.2762 -2.8395 coupl 1.1288 0.2724 4.1437 nt.mr 1.7950 0.2969 6.0453 affrd 1.5129 0.2870 5.2716 z 4.9064 0.4264 11.5057 Integration: method: Gauss-Hermite quadrature points: 21 Optimization: Convergence: 0 max(|grad|): 0.00097 quasi-Newton: BFGS
par(mfrow = c(2, 2)) plot(rasch1, items = c(1), type = c("IIC"), ylim=c(0,7)) plot(rasch1, items = c(2), type = c("IIC"), ylim=c(0,7)) plot(rasch1, items = c(3), type = c("IIC"), ylim=c(0,7)) plot(rasch1, items = c(4), type = c("IIC"), ylim=c(0,7))
margins(rasch2, "two") Response: (0,0) Item i Item j Obs Exp (O-E)^2/E 1 2 4 111 119.38 0.59 2 1 4 125 133.09 0.49 3 1 2 143 140.62 0.04 Response: (1,0) Item i Item j Obs Exp (O-E)^2/E 1 1 4 13 7.28 4.50 *** 2 1 2 5 9.57 2.18 3 2 4 27 20.99 1.72 Response: (0,1) Item i Item j Obs Exp (O-E)^2/E 1 2 4 37 30.81 1.24 2 1 4 80 72.37 0.80 3 3 4 19 21.27 0.24 Response: (1,1) Item i Item j Obs Exp (O-E)^2/E 1 1 4 147 152.26 0.18 2 1 2 155 149.97 0.17 3 3 4 208 203.36 0.11
margins(rasch2, "three") Response: (0,0,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 111 117.01 0.31 2 2 3 4 102 104.53 0.06 3 1 3 4 110 110.44 0.00 Response: (1,0,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 0 2.37 2.37 2 1 2 3 0 2.04 2.04 3 2 3 4 10 7.63 0.74 Response: (0,1,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 3 4 15 22.65 2.58 2 2 3 4 9 14.85 2.30 3 1 2 4 14 16.08 0.27 Response: (1,1,0) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 13 4.90 13.38 *** 2 1 3 4 11 5.56 5.33 *** 3 2 3 4 17 13.36 0.99 Response: (0,0,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 4 32 23.61 2.98 2 1 2 3 29 26.93 0.16 3 2 3 4 12 11.20 0.06 Response: (1,0,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 3 4 2 4.19 1.15 2 2 3 4 7 10.07 0.94 3 1 2 3 5 7.53 0.85 Response: (0,1,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 2 3 4 25 19.61 1.48 2 1 3 4 63 55.29 1.08 3 1 2 3 49 51.00 0.08 Response: (1,1,1) Item i Item j Item k Obs Exp (O-E)^2/E 1 1 2 3 151 146.10 0.16 2 1 2 4 142 145.07 0.06 3 1 3 4 145 148.07 0.06 '***' denotes a chi-squared residual greater than 3.5
Read the data into Mplus data: file is "abortion_attitude.txt"; variable: names are woman couple not_marr afford num; categorical are woman couple not_marr afford; usevariables are woman couple not_marr afford; freqweight = num; analysis: type = basic;
Basic output – sample stats FIRST ORDER SAMPLE PROPORTIONS : WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ 1 0.438 0.594 0.636 0.617 SECOND ORDER SAMPLE PROPORTIONS WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ WOMAN COUPLE 0.420 NOT_MARR 0.420 0.538 AFFORD 0.396 0.512 0.559 SAMPLE THRESHOLDS WOMAN$1 COUPLE$1 NOT_MARR AFFORD$1 ________ ________ ________ ________ 1 0.156 -0.237 -0.347 -0.299
Basic output – sample stats SAMPLE TETRACHORIC CORRELATIONS WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ WOMAN COUPLE 0.902 NOT_MARR 0.866 0.882 AFFORD 0.768 0.821 0.903 STANDARD DEVIATIONS FOR SAMPLE TETRACHORIC CORRELATIONS WOMAN COUPLE NOT_MARR AFFORD ________ ________ ________ ________ WOMAN COUPLE 0.125 NOT_MARR 0.158 0.137 AFFORD 0.217 0.181 0.120
Rasch model in Mplus data: file is “...abortion_attitude.txt"; variable: names are woman couple not_marr afford num; usevariables are woman couple not_marr afford; categorical are woman couple not_marr afford; freqweight = num; analysis: ESTIMATOR = MLR; model: F by woman* (1) couple (1) not_marr (1) afford (1); F@1; plot: type = plot3;
Mplus results TESTS OF MODEL FIT Loglikelihood H0 Value -709.937 H0 Scaling Correction Factor 1.009 for MLR Information Criteria Number of Free Parameters 5 Akaike (AIC) 1429.874 Bayesian (BIC) 1449.562 Sample-Size Adjusted BIC 1433.698 (n* = (n + 2) / 24) Chi-Square Test of Model Fit for the Binary and Ordered Categorical (Ordinal) Outcomes Pearson Chi-Square Value 22.788 Degrees of Freedom 10 P-Value 0.0116 Likelihood Ratio Chi-Square Value 22.595 Degrees of Freedom 10 P-Value 0.0123
Two-Tailed Estimate S.E. Est./S.E. P-Value F BY WOMAN 4.336 0.390 11.124 0.000 COUPLE 4.336 0.390 11.124 0.000 NOT_MARR 4.336 0.390 11.124 0.000 AFFORD 4.336 0.390 11.124 0.000 Thresholds WOMAN$1 0.776 0.311 2.496 0.013 COUPLE$1 -1.047 0.306 -3.417 0.001 NOT_MARR$1 -1.573 0.315 -4.994 0.000 AFFORD$1 -1.339 0.322 -4.161 0.000 Variance of F 1.000 0.000 999.000 999.000 IRT PARAMETERIZATION IN TWO-PARAMETER LOGISTIC METRIC WHERE THE LOGIT IS 1.7*DISCRIMINATION*(THETA - DIFFICULTY) Item Discriminations F BY WOMAN 2.551 0.229 11.124 0.000 COUPLE 2.551 0.229 11.124 0.000 NOT_MARR 2.551 0.229 11.124 0.000 AFFORD 2.551 0.229 11.124 0.000 Item Difficulties WOMAN$1 0.179 0.071 2.514 0.012 COUPLE$1 -0.241 0.072 -3.353 0.001 NOT_MARR$1 -0.363 0.074 -4.913 0.000 AFFORD$1 -0.309 0.074 -4.199 0.000 Variance of F 1.000 0.000 999.000 999.000 Mplus results
raschtest woman couple not_marr afford, meandifficc Estimation method: Conditional maximum likelihood (CML) Number of items: 4 Number of groups: 5 (3 of them are used to compute the statistics of test) Number of individuals: 365 (0 individuals removed for missing values) Number of individuals with null or perfect score: 242 Conditional log-likelihood: -131.2562 Log-likelihood: -320.5403 Difficulty Standardized Items parameters Std. Err. R1c df p-value Outfit Infit U ----------------------------------------------------------------------------- woman 1.64747 0.19064 1.940 2 0.3790 -1.232 -0.422 -1.411 couple -0.19486 0.16979 2.342 2 0.3100 -0.574 -0.313 -0.838 not_marr -0.87046 0.18302 1.580 2 0.4538 -1.272 -1.467 -0.854 afford -0.58216 0.17588 3.937 2 0.1397 2.336 2.113 3.015 ----------------------------------------------------------------------------- R1c test R1c= 15.343 6 0.0177 Andersen LR test Z= 14.594 6 0.0237 ----------------------------------------------------------------------------- The mean of the difficulty parameters is fixed to 0 You have groups of scores with less than 30 individuals. The tests can be invalid. Ability Expected Group Score parameters Std. Err. Freq. Score ll -------------------------------------------------------------- 0 0 -2.560 2.860 102 0.38 -------------------------------------------------------------- 1 1 -1.114 0.814 29 1.15 -31.4221 -------------------------------------------------------------- 2 2 -0.109 0.664 33 1.97 -39.3159 -------------------------------------------------------------- 3 3 1.054 0.984 61 2.84 -53.2214 -------------------------------------------------------------- 4 4 2.833 3.626 140 3.66 --------------------------------------------------------------
raschtest woman couple not_marr afford, method(mml) Estimation method: Marginal maximum likelihood (MML) Number of items: 4 Number of groups: 5 (5 of them are used to compute the statistics of test) Number of individuals: 365 (0 individuals removed for missing values) Number of individuals with null or perfect score: 242 Marginal log-likelihood: -665.8056 Log-likelihood: -281.5298 Difficulty Standardized Items parameters Std. Err. R1m df p-value Outfit Infit ---------------------------------------------------------------------- woman 1.25298 0.26213 4.606 2 0.0999 -2.624 0.164 couple -0.66034 0.30265 18.408 2 0.0001 . -3.567 not_marr -1.27117 0.29512 11.668 2 0.0029 . -0.046 afford -1.02314 0.29784 26.037 2 0.0000 . -0.611 ---------------------------------------------------------------------- R1m test R1m= 31.056 8 0.0001 ---------------------------------------------------------------------- Sigma 4.12109 0.28776 ---------------------------------------------------------------------- You have groups of scores with less than 30 individuals. The tests can be invalid. Ability Expected Group Score parameters Std. Err. Freq. Score --------------------------------------------------- 0 0 -4.33624 1.60486 102 0.11 --------------------------------------------------- 1 1 -2.24208 0.68050 29 0.70 --------------------------------------------------- 2 2 -1.29596 1.34362 33 1.34 --------------------------------------------------- 3 3 2.05542 0.97822 61 3.55 --------------------------------------------------- 4 4 3.91275 1.56707 140 3.91 ---------------------------------------------------
Rasch model – data prepping Difficulty
The data +---------------------------------------+ | woman couple not_marr afford num | |---------------------------------------| 1. | 1 1 1 1 141 | 2. | 0 0 0 0 103 | 3. | 0 1 1 1 44 | 4. | 0 0 1 1 21 | 5. | 0 0 0 1 13 | 6. | 1 1 1 0 12 | 7. | 0 0 1 0 10 | 8. | 0 1 0 0 9 | |---------------------------------------| 9. | 0 1 1 0 7 | 10. | 1 0 1 1 6 | 11. | 0 1 0 1 6 | 12. | 1 1 0 1 3 | 13. | 1 1 0 0 3 | 14. | 1 0 0 0 1 | 15. | 1 0 1 0 0 | 16. | 1 0 0 1 0 | +---------------------------------------+
The data +---------------------------------------+ | woman couple not_marr afford num | |---------------------------------------| 1. | 1 1 1 1 141 | 2. | 0 0 0 0 103 | 3. | 0 1 1 1 44 | 4. | 0 0 1 1 21 | 5. | 0 0 0 1 13 | 6. | 1 1 1 0 12 | 7. | 0 0 1 0 10 | 8. | 0 1 0 0 9 | |---------------------------------------| 9. | 0 1 1 0 7 | 10. | 1 0 1 1 6 | 11. | 0 1 0 1 6 | 12. | 1 1 0 1 3 | 13. | 1 1 0 0 3 | 14. | 1 0 0 0 1 | 15. | 1 0 1 0 0 | 16. | 1 0 0 1 0 | +---------------------------------------+ +---------------------+ | i1 i2 i3 i4 num | |---------------------| 1. | 1 1 1 1 141 | 2. | 0 0 0 0 103 | 3. | 0 1 1 1 44 | 4. | 0 0 1 1 21 | 5. | 0 0 0 1 13 | 6. | 1 1 1 0 12 | 7. | 0 0 1 0 10 | 8. | 0 1 0 0 9 | |---------------------| 9. | 0 1 1 0 7 | 10. | 1 0 1 1 6 | 11. | 0 1 0 1 6 | 12. | 1 1 0 1 3 | 13. | 1 1 0 0 3 | 14. | 1 0 0 0 1 | 15. | 1 0 1 0 0 | 16. | 1 0 0 1 0 | +---------------------+
Reshape +-----------------------------------+ | i1 i2 i3 i4 num pattern | |-----------------------------------| | 1 1 1 1 141 1 | | 0 0 0 0 103 2 | | 0 1 1 1 44 3 | | 0 0 1 1 21 4 | |-----------------------------------| +------------------------------+ | pattern item score num | |------------------------------| | 1 1 1 141 | | 1 2 1 141 | | 1 3 1 141 | | 1 4 1 141 | |------------------------------| | 2 1 0 103 | | 2 2 0 103 | | 2 3 0 103 | | 2 4 0 103 | |------------------------------| | 3 1 0 44 | | 3 2 1 44 | | 3 3 1 44 | | 3 4 1 44 | |------------------------------| gen pattern = _n reshape long i, i(pattern) j(item) rename i score
Create dummies for the 4 items +------------------------------------------------------------------------------+ | pattern item score num d1 d2 d3 d4 negd1 negd2 negd3 negd4 | |------------------------------------------------------------------------------| | 1 1 1 141 1 0 0 0 -1 0 0 0 | | 1 2 1 141 0 1 0 0 0 -1 0 0 | | 1 3 1 141 0 0 1 0 0 0 -1 0 | | 1 4 1 141 0 0 0 1 0 0 0 -1 | |------------------------------------------------------------------------------| | 2 1 0 103 1 0 0 0 -1 0 0 0 | | 2 2 0 103 0 1 0 0 0 -1 0 0 | | 2 3 0 103 0 0 1 0 0 0 -1 0 | | 2 4 0 103 0 0 0 1 0 0 0 -1 | +------------------------------------------------------------------------------+ …etc. tab item, gen(d) forvalues i=1/4 { gen negd`i' = -d`i' }
Relate this back to original equation +------------------------------------------------------------------------------+ | pattern item score num d1 d2 d3 d4 negd1 negd2 negd3 negd4 | |------------------------------------------------------------------------------| | 1 1 1 141 1 0 0 0 -1 0 0 0 | | 1 2 1 141 0 1 0 0 0 -1 0 0 | | 1 3 1 141 0 0 1 0 0 0 -1 0 | | 1 4 1 141 0 0 0 1 0 0 0 -1 | |------------------------------------------------------------------------------|
Rasch model using GLLAMM • Rename num wt2 (shows that weighting applies at level-2 i.e. person level) • constraint def 1 [patt1]_cons = 1 • gllamm score negd1-negd4, i(pattern) /// weight(wt) /// link(logit) /// family(binomial) /// frload(1) constr(1) /// nip(15) nocons adapt trace
Rasch results log likelihood = -852.8413222195184 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- negd1 | .333879 .1287913 2.59 0.010 .0814527 .5863054 negd2 | -.4827996 .1301057 -3.71 0.000 -.7378022 -.2277971 negd3 | -.7138122 .1321945 -5.40 0.000 -.9729087 -.4547158 negd4 | -.6117231 .131175 -4.66 0.000 -.8688214 -.3546249 ------------------------------------------------------------------------------ Variances and covariances of random effects ------------------------------------------------------------------------------ ***level 2 (pattern) var(1): 1 (0) ------------------------------------------------------------------------------ We constrained this to unit variance
Plot ICC’s ---------------------------- | Coef. ------------------+--------- negd1 (woman) | .334 negd2 (couple) | -.483 negd3 (not_marr) | -.714 negd4 (afford) | -.612 ---------------------------- Curves are parallel First item is most “difficult” twoway(function Woman =invlogit(x-[score]negd1), range(-6 6)) /// (function Couple =invlogit(x-[score]negd2), range(-6 6) lpatt(".")) /// (function Not_married =invlogit(x-[score]negd3), range(-6 6) lpatt("-")) /// (function Afford =invlogit(x-[score]negd4), range(-6 6) lpatt("_"))
GLLAMM versus raschtest • Raschtest • Avoids the need to derive dummy variables • Needs complete dataset, not frequency-weights • Reformats the dataset in the background so no need to do it yourself • Can employ CML (an estimation specific to Rasch models) which requires no integration
Extension to polytomous IRT • We now have a hierarchy of parameters to model • At the test level • A number of items models simultaneously with the potential for parameters to vary across items • At the item level • Contrasts are used to model the response categories within each item. Parameters may/may-not vary across response categories.
So when faced with a set of polytomous items We must decide • The payoff from not collapsing into binary items • The form of contrasts needed to model over response categories within items • Any constraints required across these response categories • Any parameter constraints across items within a single test
4 commonly used polytomous IRT models • Partial Credit model (PCM) • Masters (1982) • Rating Scale model (RSM) • Andrich (1978a/b) • Graded Response model (GRM) • Samejima (1969) • Nominal Response model (NRM) • Bock (1972)
Partial Credit Model (PCM) • Designed for items where you can obtain a “partial credit”, e.g. 0 = solved nothing, 1 = solved part A, 2 = solved parts A and B • i.e. those who scored a ‘2’ can also be thought of as having achieved a ‘1’