1 / 92

Summer School

Summer School. Week 2. Contents. Logistic regression refresher Some familiar + some less familiar polytomous models 1PL/2PL in Stata and R PCM/RSM/GRM in R Link IRT to CFA/UV in Mplus DIF/MIMIC in Mplus. Types of outcome. Two categories Binary / dichotomous Ordered

pelham
Download Presentation

Summer School

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summer School Week 2

  2. Contents • Logistic regression refresher • Some familiar + some less familiar polytomous models • 1PL/2PL in Stata and R • PCM/RSM/GRM in R • Link IRT to CFA/UV in Mplus • DIF/MIMIC in Mplus

  3. Types of outcome • Two categories • Binary / dichotomous • Ordered • e.g. low birthweight (< 2500g), height > 6ft, age > 70 • Unordered • e.g. gender, car-ownership, disease status • Presence of ordering is unimportant for binaries

  4. Types of outcome • 3+ categories • Polytomous • Ordered (ordinal) • Age (<30,30-40,41+) • “Likert” items (str disagree, disagree, …, str agree) • Unordered (nominal) • Ethnicity (white/black/asian/other) • Pet ownership (none/cat/dog/budgie/goat)

  5. Modelling options (LogR/IRT)

  6. Binary Logistic Regression

  7. Binary Logistic Regression Probability of a positive response / outcome given a covariate Intercept Regression coefficient

  8. Binary Logistic Regression Probability of a negative response

  9. Logit link function • Probabilities only in range [0,1] • Logit transformation is cts in range (–inf,inf) • Logit is linear in covariates

  10. Simple example – cts predictor • Relationship between birthweight and head circumference (at birth) • Exposure • birthweight (standardized) variable | mean sd ----------+------------------ bwt | 3381.5g 580.8g -----------------------------

  11. Simple example – cts predictor • Outcome • Head-circumference ≥ 53cm headcirc | Freq. % ----------+----------------- 0 | 8,898 84.4% 1 | 1,651 15.7% ----------+----------------- Total | 10,549

  12. Simple example – cts predictor The raw data – doesn’t show much

  13. Simple example – cts predictor Logistic regression models the probabilities (here shown for deciles of bwt) | bwt_z_grp headcirc | 0 1 2 3 4 | -----------+-------------------------------------------------------+-- 0 | 1,006 993 1,050 946 1,024 | | 99.80 98.12 97.95 96.04 93.35 | -----------+-------------------------------------------------------+-- 1 | 2 19 22 39 73 | | 0.20 1.88 2.05 3.96 6.65 | -----------+-------------------------------------------------------+-- headcirc | 5 6 7 8 9 | Total -----------+-------------------------------------------------------+---------- 0 | 931 922 856 688 381 | 8,797 | 89.95 84.98 81.84 66.67 35.94 | 84.33 -----------+-------------------------------------------------------+---------- 1 | 104 163 190 344 679 | 1,635 | 10.05 15.02 18.16 33.33 64.06 |15.67 -----------+-------------------------------------------------------+----------

  14. Simple example – cts predictor Increasing, non-linear relationship

  15. Simple example – cts predictor Logistic regression Number of obs = 10432 LR chi2(1) = 2577.30 Prob > chi2 = 0.0000 Log likelihood = -3240.9881 Pseudo R2 = 0.2845 ------------------------------------------------------------------------------ headcirc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- bwt_z | 7.431853 .378579 39.38 0.000 6.72569 8.212159 ------------------------------------------------------------------------------ Or in less familiar log-odds format ------------------------------------------------------------------------------ headcirc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- bwt_z | 2.005775 .0509401 39.38 0.000 1.905935 2.105616 _cons | -2.592993 .0474003 -54.70 0.000 -2.685896 -2.50009 ------------------------------------------------------------------------------

  16. Simple example – cts predictor Fitted model – logit scale

  17. Simple example – cts predictor Fitted model – logit scale Cons = -2.59 Slope = 2.00

  18. Simple example – cts predictor But also…a logit of zero represents point at which both levels of outcome are equally likely

  19. Simple example – cts predictor Fitted model – probability scale

  20. Simple example – cts predictor Fitted model – probability scale Point at which curve changes direction

  21. Simple example – cts predictor Observed and fitted values (within deciles of bwt)

  22. LogR cts predictor - summary • Logit is linearly related to covariate • Gradient gives strength of association • Intercept is related to prevalence of outcome • + seldom used • Non-linear (S-shaped) relationship between • probabilities and covariate • Steepness of linear-section infers • strength of association • Point at which curve changes direction is where • P(u=1|X) = P(u=0|X) can be thought of as the • location + isrelated to prevalence of outcome

  23. LogR – binary predictor • Define binary predictor: bwt ≥ 8lb • 32% of the sample had a birthweight of 8lb+ • Same outcome • Head circumference > 53cm • Does being 8lb+ at birth increase the chance of you being born with a larger head?

  24. Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------- 0 | 6,704 384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------- 1 | 2,093 1,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------- Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00

  25. Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------- 0 | 6,704384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------- 1 | 2,0931,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------- Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00 Familiar with (6704*1251)/(2093*384) = 10.43 = odds-ratio

  26. Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------- 0 | 6,704384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------- 1 | 2,0931,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------- Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00 Familiar with (6704*1251)/(2093*384) = 10.43 = odds-ratio However ln[(6704*1251)/(2093*384)] = 2.345 = log odds-ratio

  27. Association can be cross-tabbed headcirc bwt_8lb | 0 1 | Total -----------+----------------------+---------- 0 | 6,704 384 | 7,088 | 94.58 5.42 | 100.00 -----------+----------------------+---------- 1 | 2,093 1,251 | 3,344 | 62.59 37.41 | 100.00 -----------+----------------------+---------- Total | 8,797 1,635 | 10,432 | 84.33 15.67 | 100.00 Familiar with (6704*1251)/(2093*384) = 10.43 = odds-ratio However ln[(6704*1251)/(2093*384)] = 2.345 = log odds-ratio and ln[(384)/(6704)]= ln(0.057) = -2.86 = intercept on logit scale

  28. Logit output (from Stata) . logit headcirc bwt_8lb Logistic regression Number of obs = 10432 LR chi2(1) = 1651.89 Prob > chi2 = 0.0000 Log likelihood = -3703.6925 Pseudo R2 = 0.1823 ------------------------------------------------------------------------------ headcirc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- bwt_8lb | 2.345162 .063486 36.94 0.000 2.220732 2.469592 _cons | -2.859817 .0524722 -54.50 0.000 -2.962661 -2.756974 ------------------------------------------------------------------------------

  29. What lovely output figures! There is still an assumed s-shape on probability scale although the curve is not apparent Linear relationship in logit space

  30. What lovely output figures! Intercept = -2.86 Slope = 2.35 There is still an assumed s-shape on probability scale although the curve is not apparent Linear relationship in logit space

  31. LogR binary predictor - summary • The same maths/assumptions underlie the models with a binary predictor • Estimation is simpler – can be done from crosstab rather than needing ML • Regression estimates relate to linear relationship on logit scale

  32. Multinomial Logistic Regression

  33. Multinomial Logistic Regression • Typically used for non-ordinal (nominal) outcomes • Can be used for ordered data (some information is ignored) • 3+ outcome levels • Adding another level adds another set of parameters so more than 4 or 5 levels can be unwieldy

  34. Multinomial Logistic Regression where c0 = α0 = 0 Here the probabilities are obtained by a “divide-by-total” procedure

  35. Examples • Outcome: head-circumference • 4 roughly equal groups (quartiles) • Ordering will be ignored headcirc4 | Freq. Percent ------------+--------------------------- <= 49cm | 2,574 24.4% 49.1–50.7cm | 2,655 25.2% 50.8–51.9cm | 2,260 21.4% 52+ cm | 3,060 29.0% ------------+--------------------------- Total | 10,549 100.00 • Exposure 1: birthweight of 8lb or more • Exposure 2: standardized birthweight

  36. Exposure 1: bwt > 8lb • 32% of the sample had a birthweight of 8lb+ • Does being 8lb+ at birth increase the chance of you being born with a larger head • Unlike the logistic model we are concerned with three probabilities • P(headcirc = 49.1 – 50.7cm) • P(headcirc = 50.8 – 51.9cm) • P(headcirc = 52+cm) • Each is referenced against the “negative response” i.e. that headcirc <= 49cm

  37. Exposure 1: bwt > 8lb . mlogit headcirc4 bwt_8lb, baseoutcome(0) Multinomial logistic regression -------------------------------------------------------------- headcirc4 | Coef. SE z P>|z| [95% CI] -------------+------------------------------------------------ 1 | bwt_8lb | 1.56 .135 11.53 0.000 1.30 1.83 _cons | -.07 .029 -2.30 0.022 -0.12 -0.01 -------------+------------------------------------------------ 2 | bwt_8lb | 3.09 .129 23.98 0.000 2.84 3.34 _cons | -.58 .034 -17.33 0.000 -0.65 -0.52 -------------+------------------------------------------------ 3 | bwt_8lb | 4.39 .127 34.43 0.000 4.14 4.64 _cons | -.99 .039 -25.56 0.000 -1.06 -0.92 -------------------------------------------------------------- (headcirc4==0 is the base outcome) 3 sets of results Each is reference against the “baseline” group, i.e. <=49cm

  38. Exposure 1: bwt > 8lb . mlogit headcirc4 bwt_8lb, baseoutcome(0) Multinomial logistic regression --------------------------------- headcirc4 | Coef. (SE) -------------+------------------- 1 | bwt_8lb | 1.56 (.135) _cons | -.07 (.029) -------------+------------------- 2 | bwt_8lb | 3.09 (.129) _cons | -.58 (.034) -------------+------------------- 3 | bwt_8lb | 4.39 (.127) _cons | -.99 (.039) --------------------------------- (headcirc4==0 is the base outcome) Logistic regression ---------------------------------- head_1 | Coef. Std. Err. --------+------------------------- bwt_8lb | 1.56099 .1353772 _cons | -.0664822 .0289287 ---------------------------------- Logistic regression ---------------------------------- head_2 | Coef. Std. Err. --------+------------------------- bwt_8lb | 3.088329 .1287576 _cons | -.5822197 .0335953 ---------------------------------- Logistic regression ---------------------------------- head_3 | Coef. Std. Err. --------+------------------------- bwt_8lb | 4.389338 .127473 _cons | -.9862376 .0385892 ----------------------------------

  39. Exposure 1: bwt > 8lb . mlogit headcirc4 bwt_8lb, baseoutcome(0) Multinomial logistic regression --------------------------------- headcirc4 | Coef. (SE) -------------+------------------- 1 | bwt_8lb | 1.56 (.135) _cons | -.07 (.029) -------------+------------------- 2 | bwt_8lb | 3.09 (.129) _cons | -.58 (.034) -------------+------------------- 3 | bwt_8lb | 4.39 (.127) _cons | -.99 (.039) --------------------------------- (headcirc4==0 is the base outcome) Logistic regression ---------------------------------- head_1 | Coef. Std. Err. --------+------------------------- bwt_8lb | 1.56099 .1353772 _cons | -.0664822 .0289287 ---------------------------------- Logistic regression ---------------------------------- head_2 | Coef. Std. Err. --------+------------------------- bwt_8lb | 3.088329 .1287576 _cons | -.5822197 .0335953 ---------------------------------- Logistic regression ---------------------------------- head_3 | Coef. Std. Err. --------+------------------------- bwt_8lb | 4.389338 .127473 _cons | -.9862376 .0385892 ----------------------------------

  40. Exposure 1: bwt > 8lb • For a categorical exposure, a multinomial logistic model fitted over 4 outcome levels gives the same estimates as 3 logistic models, i.e. Logit(0v1) Multinomial(0v1,0v2,0v3) ≡ Logit(0v2) Logit(0v3) • In this instance, the single model is merely more convenient and allows the testing of equality constraints

  41. Exposure 2: Continuous bwt • Using standardized birthweight we are interesting in how the probability of having a larger head, i.e. • P(headcirc = 49.1 – 50.7cm) • P(headcirc = 50.8 – 51.9cm) • P(headcirc = 52+cm) increases as birthweight increases As with the binary logistic models, estimates will reflect • A change in log-odds per SD change in birthweight • The gradient or slope when in the logit scale

  42. Exposure 2: Continuous bwt mlogit headcirc4 bwt_z, baseoutcome(0) Multinomial logistic regression -------------------------------------------------------------- headcirc4 | Coef. SE z P>|z| [95% CI] -------------+------------------------------------------------ 1 | bwt_z | 2.10 .063 33.11 0.000 1.97 2.22 _cons | 1.06 .044 23.85 0.000 0.97 1.14 -------------+------------------------------------------------ 2 | bwt_z | 3.52 .078 44.89 0.000 3.37 3.68 _cons | 0.78 .046 16.95 0.000 0.69 0.87 -------------+------------------------------------------------ 3 | bwt_z | 4.88 .086 56.90 0.000 4.72 5.05 _cons | 0.33 .051 6.51 0.000 0.23 0.43 -------------------------------------------------------------- (headcirc4==0 is the base outcome)

  43. Exposure 2: Continuous bwt Logistic regression ------------------------------------- head_1 | Coef. Std. Err. -------+----------------------------- bwt_z | 2.093789 .0650987 _cons | 1.058445 .0447811 ------------------------------------- Logistic regression ------------------------------------- head_2 | Coef. Std. Err. -------+----------------------------- bwt_z | 3.355041 .0959539 _cons | .6853272 .0464858 ------------------------------------- Logistic regression ------------------------------------- head_3 | Coef. Std. Err. -------+----------------------------- bwt_z | 3.823597 .1065283 _cons | .3129028 .0492469 ------------------------------------- Multinomial logistic regression ------------------------------ headcirc4 | Coef. (SE) -------------+---------------- 1 | bwt_z | 2.10 (.063) _cons | 1.06 (.044) -------------+---------------- 2 | bwt_z | 3.52 (.078) _cons | 0.78 (.046) -------------+---------------- 3 | bwt_z | 4.88 (.086) _cons | 0.33 (.051) ------------------------------ (headcirc4==0 is the base outcome) No longer identical

  44. Exposure 2: Continuous bwt Outcome level 2 = [49.1 – 50.7] Intercept = 1.06 Slope = 2.10 Shallowest Outcome level 3 = [50.8 – 51.9] Intercept = 0.78 Slope = 3.52 Outcome level 4 = [52.0 –] Intercept = 0.33 Slope = 4.88 Steepest Risk of being in outcome level 4 increases most sharply as bwt increases

  45. Can plot probabilities for all 4 levels

  46. Or altogether on one graph….

  47. Or altogether on one graph….

  48. Ordinal Logistic Models

  49. Ordinal Logistic Models • When applicable, it is useful to favour ordinal models over multinomial models • If outcome levels are increasing, e.g. in severity of a condition or agreement with a statement, we expect the model parameters to behave in a certain way • The typical approach is to fit ordinal models with constraints resulting in greater parsimony (less parameters)

  50. Contrasting among response categories - some alternative models For a 4-level outcome there are three comparisons to be made Model 1 – that used in the multinomial logistic model Model 2 – used with the proportional-odds ordinal model Model 3 – adjacent category model

More Related