1 / 54

Binary Models 1

Binary Models 1. A (Longitudinal) Latent Class Analysis of Bedwetting. How much fun can you have with 5 binary variables?. Croudace paper:. Gender-specific prevalence and levels of missing data. Latent Class Models. Deal with patterns of response e.g. 11111 = Yes at all five time points

marva
Download Presentation

Binary Models 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Binary Models 1 A (Longitudinal) Latent Class Analysis of Bedwetting

  2. How much fun can you have with 5 binary variables?

  3. Croudace paper:

  4. Gender-specific prevalence and levels of missing data

  5. Latent Class Models • Deal with patterns of response e.g. 11111 = Yes at all five time points 00000 = No at all five time points 11000 = Yes early on, followed by no 10101 = Alternating pattern 11*** = Yes followed by missing 1*0*1 = etc.

  6. Latent Class Models

  7. Conditional independence • manifest variables are independent given latent class, can be written as a product of all the probabilities for the individual items of that pattern. E.g. for a pattern ’01’, if pat(1) to be the first element of a pattern i.e. the first binary variable, then P( pattern = ’01’ | class = i) = P(pat(1) = ‘0’ | class = i)*P(pat(2) = ‘1’ | class = i)

  8. Huh? • Basically, Bayes’ rule plus conditional independence assumption enable you to calculate the likelihood of each pattern being assigned to each class. • By selecting starting values for the probabilities, ot alternatively a starting assignment for the patterns, one can iterate to convergence to find the best solution given your chosen number of classes

  9. Distribution of patterns (complete data) +--------------------------------+ | 4.5 5.5 6.5 7.5 9.5 N | |--------------------------------| 1. | 0 0 0 0 0 3535 | 2. | 1 0 0 0 0 529 | 3. | 1 1 1 1 1 303 | 4. | 1 1 0 0 0 241 | 5. | 1 1 1 1 0 199 | |--------------------------------| 6. | 1 1 1 0 0 176 | 7. | 0 1 0 0 0 138 | 8. | 0 0 1 0 0 99 | 9. | 1 0 1 0 0 82 | 10. | 0 0 0 1 0 69 | |--------------------------------| 11. | 1 0 1 1 0 48 | 12. | 0 0 0 0 1 43 | 13. | 1 1 0 1 0 40 | 14. | 0 1 1 0 0 39 | 15. | 1 1 1 0 1 29 | |--------------------------------| 16. | 1 0 0 1 0 28 | +--------------------------------+ +--------------------------------+ | 4.5 5.5 6.5 7.5 9.5 N | |--------------------------------| 17. | 0 0 1 1 0 27 | 18. | 0 1 1 1 0 25 | 19. | 1 0 0 0 1 24 | 20. | 0 1 1 1 1 23 | |--------------------------------| 21. | 1 0 1 1 1 20 | 22. | 1 1 0 0 1 20 | 23. | 1 1 0 1 1 20 | 24. | 0 0 1 1 1 15 | 25. | 0 0 0 1 1 12 | |--------------------------------| 26. | 0 1 0 1 0 11 | 27. | 1 0 0 1 1 11 | 28. | 0 1 1 0 1 8 | 29. | 0 1 0 1 1 8 | 30. | 0 1 0 0 1 8 | |--------------------------------| 31. | 1 0 1 0 1 7 | 32. | 0 0 1 0 1 6 | +--------------------------------+

  10. Distribution of patterns with some missingness +--------------------------------+ | 4.5 5.5 6.5 7.5 9.5 N | |--------------------------------| 1. | 0 0 0 0 . 458 | 2. | 0 . . . . 398 | 3. | 0 0 0 . . 260 | 4. | 0 0 . . . 252 | 5. | 0 0 0 . 0 248 | |--------------------------------| 6. | 1 . . . . 150 | 7. | 0 0 . 0 0 146 | 8. | . 0 0 0 0 138 | 9. | 0 . 0 0 0 125 | 10. | . . . 0 0 124 | |--------------------------------| 11. | . . . . 0 123 | 12. | 0 0 . 0 . 109 | 13. | . . . 0 . 107 | 14. | 0 . 0 . . 94 | 15. | . 0 . . . 92 | |--------------------------------| +--------------------------------+ | 4.5 5.5 6.5 7.5 9.5 N | |--------------------------------| 16. | 1 0 0 0 . 71 | 17. | 0 0 . . 0 70 | 18. | 1 1 1 1 . 66 | 19. | 0 . . . 0 65 | 20. | 1 1 . . . 62 | |--------------------------------| 21. | 0 . . 0 0 57 | 22. | . . 0 . . 55 | 23. | 1 0 0 . 0 54 | 24. | 0 . 0 0 . 53 | 25. | 1 0 . . . 50 | |--------------------------------| Etc. 182. | . . 0 1 0 1 | 183. | 0 0 . 1 1 1 | +--------------------------------+

  11. Thresholds • Mplus thinks of binary variables as being a dichotomised continuous latent variable • The point at which a continuous N(0,1) variable must be cut to create a binary variable is called a threshold • A binary variable with 50% cases corresponds to a threshold of zero • A binary variable with 2.5% cases corresponds to a threshold of 1.96

  12. Thresholds Figure from Uebersax webpage

  13. Categorical variables • A categorical variable with n-levels requires n-1 thresholds • i.e. you need to make n-1 cuts in a continuous N(0,1) variable to make your observed n-level variable

  14. Degrees of freedom • 32 possible patterns (missing data patterns don’t count) • Each additional class requires • 5 df to estimate the 5 prevalence of wetting within that class (i.e. 5 thresholds) • 1 df for an additional cut of the latent variable defining the class distribution • Hence a 5-class model uses up 5*5 + 4 = 29 degrees of freedom leaving 3 df to test the model

  15. Procedure • Fit 1-5 class models in turn • Output = within class thresholds (prevalences) and latent class distribution • Select preferred model using various criteria • Statistical – measures of fit • Ease of interpretation of results • Parsimony • Face validity • Astrology • Celebrate

  16. How to fit in Mplus – black box way data: file is ‘bedwetting_5tp.txt'; listwise is on; variable: names sex bwt marr m_age parity educ tenure ne_kk ne_km ne_kp ne_kr ne_ku; categorical = ne_kk ne_km ne_kp ne_kr ne_ku; usevariables ne_kk ne_km ne_kp ne_kr ne_ku; missing are ne_kk ne_km ne_kp ne_kr ne_ku (-9); classes = c (4); analysis: type = mixture; starts = 1000 500; stiterations = 10; stscale = 20; model: %OVERALL% Is that it????

  17. What you’re actually doing: model: %OVERALL% [c#1 c#2 c#3]; %c#1% [ne_kk$1]; [ne_km$1]; [ne_kp$1]; [ne_kr$1]; [ne_ku$1]; %c#2% [ne_kk$1]; [ne_km$1]; [ne_kp$1]; [ne_kr$1]; [ne_ku$1]; %c#3% [ne_kk$1]; [ne_km$1]; [ne_kp$1]; [ne_kr$1]; [ne_ku$1]; %c#4% [ne_kk$1]; [ne_km$1]; [ne_kp$1]; [ne_kr$1]; [ne_ku$1]; 4 class model => 3 estimated threshold for latent class variable 5 more thresholds for each class

  18. How many random starts? • Depends on • Sample size • Complexity of model • Number of manifest variables • Number of classes • Aim to find consistently the model with the lowest likelihood, within each run

  19. Loglikelihood values at local maxima, seeds, and initial stage start numbers: -10148.718 987174 1689 -10148.718 777300 2522 -10148.718 406118 3827 -10148.718 51296 3485 -10148.718 997836 1208 -10148.718 119680 4434 -10148.718 338892 1432 -10148.718 765744 4617 -10148.718 636396 168 -10148.718 189568 3651 -10148.718 469158 1145 -10148.718 90078 4008 -10148.718 373592 4396 -10148.718 73484 4058 -10148.718 154192 3972 -10148.718 203018 3813 -10148.718 785278 1603 -10148.718 235356 2878 -10148.718 681680 3557 -10148.718 92764 2064 Loglikelihood values at local maxima, seeds, and initial stage start numbers -10153.627 23688 4596 -10153.678 150818 1050 -10154.388 584226 4481 -10155.122 735928 916 -10155.373 309852 2802 -10155.437 925994 1386 -10155.482 370560 3292 -10155.482 662718 460 -10155.630 320864 2078 -10155.833 873488 2965 -10156.017 212934 568 -10156.231 98352 3636 -10156.339 12814 4104 -10156.497 557806 4321 -10156.644 134830 780 -10156.741 80226 3041 -10156.793 276392 2927 -10156.819 304762 4712 -10156.950 468300 4176 -10157.011 83306 2432 Success Not there yet

  20. What the output looks like TESTS OF MODEL FIT Loglikelihood H0 Value -10153.129 H0 Scaling Correction Factor 1.007 for MLR Information Criteria Number of Free Parameters 23 Akaike (AIC) 20352.258 Bayesian (BIC) 20505.737 Sample-Size Adjusted BIC 20432.649 (n* = (n + 2) / 24) Chi-Square Test of Model Fit for the Binary outcomes Pearson Chi-Square 11.543 Degrees of Freedom 8 P-Value 0.1728 Likelihood Ratio Chi-Square 11.210 Degrees of Freedom 8 P-Value 0.1901

  21. Measures of entropy Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 3 4 1 0.846 0.067 0.040 0.047 2 0.110 0.808 0.037 0.046 3 0.030 0.095 0.875 0.000 4 0.041 0.007 0.000 0.952 Entropy (global) 0.844

  22. Model based + modal class latent class distribution FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent classes 1 790.04734 0.13521 2 297.76475 0.05096 3 511.59879 0.08756 4 4243.58913 0.72627 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent classes 1 674 0.11535 2 211 0.03611 3 545 0.09327 4 4413 0.75526

  23. Model results Estimates S.E. Est./S.E. Latent Class 3 Thresholds NE_KK$1 -3.122 0.544 -5.739 NE_KM$1 -15.000 0.000 0.000 NE_KP$1 -3.172 0.481 -6.593 NE_KR$1 -3.224 0.609 -5.294 NE_KU$1 -0.563 0.117 -4.800 Latent Class 4 Thresholds NE_KK$1 2.093 0.073 28.484 NE_KM$1 3.698 0.198 18.651 NE_KP$1 3.796 0.140 27.200 NE_KR$1 4.213 0.180 23.383 NE_KU$1 4.420 0.169 26.172 Categorical Latent Variables Means C#1 -1.681 0.114 -14.746 C#2 -2.657 0.240 -11.092 C#3 -2.116 0.090 -23.551 Estimates S.E. Est./S.E. Latent Class 1 Thresholds NE_KK$1 -1.540 0.238 -6.475 NE_KM$1 -0.891 0.199 -4.471 NE_KP$1 0.346 0.121 2.864 NE_KR$1 2.500 0.472 5.294 NE_KU$1 2.359 0.248 9.527 Latent Class 2 Thresholds NE_KK$1 -0.294 0.208 -1.415 NE_KM$1 0.482 0.429 1.123 NE_KP$1 -0.632 0.230 -2.754 NE_KR$1 -1.539 0.750 -2.053 NE_KU$1 0.501 0.197 2.540

  24. RESULTS IN PROBABILITY SCALE Latent Class 1 NE_KK Category 1 0.177 0.035 5.106 Category 2 0.823 0.035 23.816 NE_KM Category 1 0.291 0.041 7.078 Category 2 0.709 0.041 17.250 NE_KP Category 1 0.586 0.029 19.981 Category 2 0.414 0.029 14.138 NE_KR Category 1 0.924 0.033 27.917 Category 2 0.076 0.033 2.292 NE_KU Category 1 0.914 0.020 46.773 Category 2 0.086 0.020 4.420 Latent Class 2 NE_KK Category 1 0.427 0.051 8.387 Category 2 0.573 0.051 11.258 NE_KM Category 1 0.618 0.101 6.103 Category 2 0.382 0.101 3.769 NE_KP Category 1 0.347 0.052 6.673 Category 2 0.653 0.052 12.555 NE_KR Category 1 0.177 0.109 1.620 Category 2 0.823 0.109 7.551 NE_KU Category 1 0.623 0.046 13.445 Category 2 0.377 0.046 8.150

  25. These can be plotted • In Excel from the probabilities:

  26. … or in Mplus plot: type is plot2; series is anybw_t1 (4.5) anybw_t2 (5.5) anybw_t3 (6.5) anybw_t4 (7.5) anybw_t5 (9.5); Then Graph> view graphs> estimated probabilities

  27. Yuck!

  28. Model fit stats

  29. Model fit stats - BIC • Bayesian Information Criterion = -2*Log-likelihood + (# params)*ln(sample size) • Function of likelihood which rewards a more parsimonius model • Decrease followed by an increase as extra classes are added

  30. Model fit stats - Entropy • Measure of the certainty with which patterns (and therefore subjects) are assigned to latent classes. • Higher values indicate a greater delineation of classes – ideally approaching 1 (Celeux & Soromenho) • 0.62 would indicate 'fuzzyness' (Ramaswamy) • Thorough job: • Examine global entropy, • Examine class-specific entropy • Examine the assignment probabilities for each individual pattern

  31. Model fit stats - BLRT • Bootstrap Likelihood Ratio Test • Traditional Δ-likelihood test cannot be used to test nested latent class models as the difference in likelihoods is not chi-square distributed • The BLRT empirically estimates the difference distribution providing a p-value for the observed difference which can be used in tests of model fit

  32. How to obtain BLRT • Fit k-class model • Select an optseed which replicates the solution with the lowest likelihood for k-classes • Re-fit k-class model using optseed, specifying tech14 in the output section and selecting an appropriate number of runs for bootstrapping k-1starts = 100 20; lrtstarts 0 0 250 50; lrtbootstrap 100; • Ensure that optimal k-1 class model has been replicated

  33. BLRT Output PARAMETRIC BOOTSTRAPPED LIKELIHOOD RATIO TEST FOR 4 (H0) VERSUS 5 CLASSES H0 Loglikelihood Value -2097.721 2 Times the Loglikelihood Difference 5.513 Difference in the Number of Parameters 7 Approximate P-Value 0.7600 Successful Bootstrap Draws 100 Does this agree with what you obtained for 4 class run? B LRT p-value implies no improvement in fit when moving from 4 to 5 classes

  34. Model fit stats BLRT 4 class model is adequate fit to data and there is little improvement in fit when a 5th class is added BIC attains it’s minimum value at the 4 class model (penalising 5-class for it’s lack of parsimony) Entropy 2-class model has highest entropy, however all values are reasonably high

  35. Other considerations - interpretation 4 class model 5 class model

  36. Final decision • 4 and 5 class model fit the data, parsimony favours 4 class • Both 4 and 5 class models make intuitive sense • 4 classes: ‘normative’, ‘delayed’, ‘persistent’, ‘relapse’ • 5 class brings us ‘severe delay’ • 5 class model is in good agreement with Croudace et al (2004) • Job’s a good ‘un

  37. Conclusions • Like EFA, LCA is an exploratory tool with the aim of summarising the variability in the dataset in a simple/interpretable way • These results do not prove that there are 4 or 5 groups of children in mere life. • LCA will find groupings in the data even if there is no reason to think such groups might exist.

  38. Bring on the covariates

  39. Predicting class membership • One can strengthen the assertion that subjects can be neatly packaged in little groups if one can show that these groups differ with respect to • Co-morbid conditions • Aetiogical factors • Later outcomes • Even better if such findings support what is seen in (a) other epidemiological studies, (b) clinical settings

  40. Output from LCA: • Class distribution • Set of probabilities defining the likelihood that each observed pattern can be assigned to each class

  41. Incorporating covariates • 2-stage method • Export class probabilities to another package – Stata • Model class membership as a multinomial model with probability weighting • Using classes derived from repeated BW measures with partially missing data (gloss over)

  42. Save data from Mplus savedata: file is “boys_5class_output_completecase.txt"; save cprob; Don’t forget to add the ID variable: variable: <snip> idvariable is ID;

  43. Dataset:

  44. Then what? • Merge Mplus output with covariates using ID • Mplus will only permit one ID variable. • ALSPAC has one ID to identify parents but two ID’s to identify the kids – parent ID plus another one • Create a composite ID e.g. 1000.1, 1000.2 and then rederive proper ID before matching • Read into Stata (or similar)

  45. Reshaping the dataset • Weighted model requires a reshaping of the dataset so that each child has n-rows (for an n-class model) rather than just 1

  46. Pre-shaped – first 20 kids | ID sex dev_18 dev_42 pclass1 pclass2 pclass3 pclass4 pclass5 modclass | |--------------------------------------------------------------------------------------------------| | 30004 male 3 . .001 0 .803 0 .197 3 | | 30008 male 2 1 .908 0 0 .007 .085 1 | | 30010 male 2 2 .053 .001 .052 0 .894 5 | | 30023 male 1 3 .115 0 .596 .001 .288 3 | | 30031 male 3 4 0 0 .983 0 .016 3 | |--------------------------------------------------------------------------------------------------| | 30033 male 4 4 .392 0 .397 0 .211 3 | | 30042 male 1 3 0 0 .983 0 .016 3 | | 30050 male 3 2 0 0 .983 0 .016 3 | | 30051 male 2 2 0 0 0 1 0 4 | | 30057 male 1 3 .135 0 .002 0 .864 5 | |--------------------------------------------------------------------------------------------------| | 30058 male 1 4 0 0 .958 0 .041 3 | | 30064 male 2 4 0 0 .983 0 .016 3 | | 30068 male 4 3 .001 0 .803 0 .197 3 | | 30070 male 3 4 0 0 .983 0 .016 3 | | 30072 male 1 1 0 0 .983 0 .016 3 | |--------------------------------------------------------------------------------------------------| | 30075 male 3 3 0 0 .982 0 .018 3 | | 30088 male 3 4 .03 .002 .889 .003 .076 3 | | 30095 male 3 . 0 0 .983 0 .016 3 | | 30098 male 3 . .068 .158 .173 .018 .583 5 | | 30104 male 4 1 .008 0 .775 0 .217 3 | +--------------------------------------------------------------------------------------------------+

  47. Pre-shaped – first 20 kids | ID sex dev_18 dev_42 pclass1 pclass2 pclass3 pclass4 pclass5 modclass | |--------------------------------------------------------------------------------------------------| | 30004 male 3 . .001 0 .803 0 .197 3 | | 30008 male 2 1 .908 0 0 .007 .085 1 | | 30010 male 2 2 .053 .001 .052 0 .894 5 | | 30023 male 1 3 .115 0 .596 .001 .288 3 | | 30031 male 3 4 0 0 .983 0 .016 3 | |--------------------------------------------------------------------------------------------------| | 30033 male 4 4 .392 0 .397 0 .211 3 | | 30042 male 1 3 0 0 .983 0 .016 3 | | 30050 male 3 2 0 0 .983 0 .016 3 | | 30051 male 2 2 0 0 0 1 0 4 | | 30057 male 1 3 .135 0 .002 0 .864 5 | |--------------------------------------------------------------------------------------------------| | 30058 male 1 4 0 0 .958 0 .041 3 | | 30064 male 2 4 0 0 .983 0 .016 3 | | 30068 male 4 3 .001 0 .803 0 .197 3 | | 30070 male 3 4 0 0 .983 0 .016 3 | | 30072 male 1 1 0 0 .983 0 .016 3 | |--------------------------------------------------------------------------------------------------| | 30075 male 3 3 0 0 .982 0 .018 3 | | 30088 male 3 4 .03 .002 .889 .003 .076 3 | | 30095 male 3 . 0 0 .983 0 .016 3 | | 30098 male 3 . .068 .158 .173 .018 .583 5 | | 30104 male 4 1 .008 0 .775 0 .217 3 | +--------------------------------------------------------------------------------------------------+ Modal class covariates Posterior probs

  48. The reshaping reshape long pclass, i(id) j(class) (note: j = 1 2 3 4 5) Data wide -> long --------------------------------------------------------- Number of obs. 5584 -> 27920 Number of variables 66 -> 63 j variable (5 values) -> class xij variables: pclass1 pclass2 ... pclass5 -> pclass ---------------------------------------------------------

  49. Re-shaped – first 3 kids +--------------------------------------------------+ | id sex dev_18 dev_42 pclass class | |--------------------------------------------------| 1. | 30004 male 3 . .001 1 | 2. | 30004 male 3 . 0 2 | 3. | 30004 male 3 . .803 3 | 4. | 30004 male 3 . 0 4 | 5. | 30004 male 3 . .197 5 | |--------------------------------------------------| 6. | 30008 male 2 1 .908 1 | 7. | 30008 male 2 1 0 2 | 8. | 30008 male 2 1 0 3 | 9. | 30008 male 2 1 .007 4 | 10. | 30008 male 2 1 .085 5 | |--------------------------------------------------| 11. | 30010 male 2 2 .053 1 | 12. | 30010 male 2 2 .001 2 | 13. | 30010 male 2 2 .052 3 | 14. | 30010 male 2 2 0 4 | 15. | 30010 male 2 2 .894 5 | +--------------------------------------------------+ First kid Second kid Third kid

More Related