1 / 45

Cross-Sectional Mixture Modeling

Cross-Sectional Mixture Modeling. Shaunna L. Clark Advanced Genetic Epidemiology Statistical Workshop October 23, 2012. Outline . What is a mixture? Introduction to LCA (LPA) Basic Analysis IdeasPlan and Issues How to choose the number of classes How do we implement mixtures in OpenMx ?

elysia
Download Presentation

Cross-Sectional Mixture Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-Sectional Mixture Modeling Shaunna L. Clark Advanced Genetic Epidemiology Statistical Workshop October 23, 2012

  2. Outline • What is a mixture? • Introduction to LCA (LPA) • Basic Analysis Ideas\Plan and Issues • How to choose the number of classes • How do we implement mixtures in OpenMx? • Factor Mixture Model • What do classes mean for twin modeling?

  3. Homogeneity Vs. Heterogeneity • Most models assume homogeneity • i.e. Individuals in a sample all follow the same model • What have seen so far (for the most part) • But not always the case • Ex: Sex, Age, Patterns of Substance Abuse

  4. What is Mixture Modeling Used to model unobserved heterogeneity by identifying different subgroups of individuals Ex: IQ, Religiosity

  5. Latent Class Analysis (LCA) Also known as Latent Profile Analysis (LPA) if you have continuously distributed variables

  6. Latent Class Analysis Introduced by Lazarsfeld & Henry, Goodman, Clogg, Dayton & Mcready • Setting • Cross-sectional data • Multiple items measuring a construct • 12 items measuring the construct of Cannabis Abuse/Dependence • Hypothesized construct represented as latent class variable (categorical latent variable) • Different categories of Cannabis Abuse\Dependence patterns • Aim • Identify items that indicate classes well • Estimateproportion of sample in each class (class probability) • Classify individuals into classes (posterior probabilities)

  7. Latent Class Analysis Cont’d C x1 x2 x3 x4 x5

  8. Latent Class Analysis Model Dichotomous (0/1) indicators u: u1, u2, ... , ur Categorical latent variable c: c = k ; k = 1, 2, ... , K Marginal probability for item uj = 1, (probability item uj =1 is the sum over all class of the product of the probability of being in class k and the probability of endorsing item uj given that you are in class k)

  9. Joint Probabilities • Joint probability of all u’s, assuming conditional independence: • Probability of observing a given response pattern is equal to the sum over all classes of the product of being in a given class and the probability of observing a response on item 1 given that you are in latent class k, . . . (repeat for each item)

  10. Posterior Probabilities • Probability of being inclass k given your response pattern • Used to assign most likely class membership • Based on highest posterior probability

  11. Model Testing • Log-likelihood ratio χ2 test (LLRT) • Overall test against the data with H1 being the unrestricted multinomial • Problem: Not distributed as χ2 due to boundary conditions • Don’t use it!!! (McLachlan& Peele, 2000) • Information Criteria • Akaike Information Criteria, AIC (Akaike,1974) AIC = 2h-2ln(L) • Bayesian Information Criteria, BIC (Schwartz, 1978) BIC = -2ln(L)+h*ln(n) • Where L = log-likelihood, h = number of parameters, n = sample size • Chose model with lowest value of IC

  12. Other Tests • Since can’t do LLRT, use test which approximate the difference in LL values between k and k-1 class models. • Vuong-Lo-Mendell-Rubin, LMR-LRT (Lo, Mendell, & Rubin, 2001) • Parametric bootstrapped LRT, BLRT (McLachlan, 1987) • P-value is probability that H0 is true • H0: k-1 classes; H1: k classes • A low p-value indicates a preference for the estimated model (i.e. k classes) • Look for the first time the p-value is non-significant or greater than 0.05

  13. Analysis Plan • Fit model with 1-class • Everyone in same class • Sometimes simple is better • Fit LCA models 2-K classes • Chose best number of classes Seems simple right???

  14. Not really . . .Lots of known issues in Mixture Analysis • Global vs. Local Maximum Log Likelihood Log Likelihood Global Global Local Local Parameter Parameter • Use multiple sets of random starting values to make sure have global solution. Make sure that best LL value has replicated

  15. Determining the number of classes: Class Enumeration • No agreed upon way to determine the correct number of latent classes • Statistical comparisons (i.e. ICs, LRTs) • Interpretability and usefulness of classes • Substantive theory • Relationship to auxiliary variables • Predictive validity of classes • Class size • Quality of Classifications (not my favorite) • Classification table based on posterior probabilities • Entropy - A value close to 1 indicates good classification in that many individuals have posterior probabilities close to 0 or 1

  16. Suggested Strategy • Nylund et al. (2007), Tofighi & Enders (2008), among others • Simulation studies comparing tests and information criteria described previously • Suggest: • Use BIC and LMR to narrow down the number of plausible models • Then run BLRT on those models because BLRT can be computationally intensive

  17. OpenMX: LCA Example Script LCA_example.R

  18. Mixtures in OpenMx • Specify class-specific models • Create MxModel objects for each class • Specify class probabilities • Create an MxMatrix of class probabilities\proportions • Specify model-wide objective function • Pull everything together in a parent model with data • Weighted sum of the class models • Estimate entire model Note: One of potentially many ways to do this

  19. Class Specific Models nameList <- names(<dataset>) class1 <- mxModel("Class1", mxMatrix("Iden", name = "R", nrow = nvar, ncol = nvar, free=FALSE), mxMatrix("Full", name = "M", nrow = 1, ncol = nvar, free=FALSE), mxMatrix("Full", name = "ThresholdsClass1", nrow = 1, ncol = nvar, dimnames = list("Threshold",nameList), free=TRUE), mxFIMLObjective(covariance="R", means="M", dimnames=nameList, thresholds="ThresholdsClass1",vector=TRUE)) Repeat for every class in your model Don’t be like me, make sure to change class numbers

  20. Define the Model lcamodel <- mxModel("lcamodel", class1, class2, mxData(vars, type="raw"), Next, specify class membership probabilities

  21. Class Membership Probabilities • When specifying need to remember: • Class probabilities must be positive • Must sum to a constant - 1 mxMatrix("Full", name = "ClassMembershipProbabilities", nrow = nclass, ncol = 1, free=TRUE, labels = c(paste("pclass", 1:nclass, sep=""))), mxBounds(c(paste("pclass", 1:nclass, sep="")),0,1), mxMatrix("Iden", nrow = 1, name = "constraintLHS"), mxAlgebra(sum(ClassMembershipProbabilities), name = "constraintRHS"), mxConstraint(constraintLHS == constraintRHS),

  22. Model-wide objective function • Weighted sum of individual class likelihoods • Weights are class probabilities • So for two classes:

  23. Model Wide Objective Function Cont’d mxAlgebra( -2*sum(log(pclass1%x%Class1.objective + pclass2%x%Class2.objective)), name="lca"), mxAlgebraObjective("lca")) ) Now we run the model: model <- mxRun(lcamodel) And we wait and wait and wait till it’s done.

  24. Profile Plot • One way to interpret the classes is to plot them. • In our example we had binary items, so the thresholds are what distinguishes between classes • Can plot the thresholds • Or you can plot the probabilities • More intuitive • Easier for non-statisticians to understand

  25. Profile Plots in R\OpenMx #Pulling out thresholds class1T <- model@output$matrices$Class1.ThresholdsClass1 class2T <- model@output$matrices$Class2.ThresholdsClass2 #Converting threshold to probabilities class1P<-t(1/(1+exp(-class1T))) class2P<-t(1/(1+exp(-class2T)))

  26. Profile Plots Cont’D plot(class1P, type="o", col="blue",ylim=c(0,1),axes=FALSE, ann=FALSE) axis(1,at=1:12,lab=nameList) axis(2,las=1,at=c(0,0.2,0.4,0.6,0.8,1)) box() lines(class2P,type="o", pch=22, lty=2, col="red") title(main="LCA 2 Class Profile Plot", col.main="black",font.main=4) title(xlab="DSM Items", col.lab="black") title(ylab="Probability", col.lab="black") legend("bottomright",c("Class 1","Class 2"), cex=0.8, col=c("blue","red"),pch=21:22,lty=1:2)

  27. OpenMx Exercise • Unfortunately, it takes long time for these to run so not feasible to do in this session • However, I’ve run the 2-, 3-, and 4- class LCA models for this data and (hopefully) the .Rdata files are posted on the website • Exercise: Using the .Rdata files • Determine which model is better according to AIC\BIC • Want the lowest value • Make a profile plot of the best solution and interpret the classes • What kind of substances users are there?

  28. Code to pull out LL and compute AIC\BIC #Pull out LL LL_2c <- model@output$Minus2LogLikelihood LL_2cnsam = 1878 #parameters npar<- (nclass-1) + (nthresh*nvar*nclass npar #Compute AIC & BIC AIC_2c = 2*npar + LL_2c AIC_2c BIC_2c = LL_2c + (npar*log(nsam)) BIC_2c

  29. Table of Results

  30. 3-Class Profile Plot

  31. Factor Mixture Modeling

  32. Problem with LCA • Once in a class, everyone “looks” the same. • In the context of substance abuse, unlikely that every user will have the same patterns of use • Withdrawal, tolerance, hazardous use • There is variation within a latent class • Severity • One proposed solution is the factor mixture model • Uses a latent class variables to classify individuals and latent factor to model severity

  33. Factor Mixture Model σ2F F C λ3 λ1 λ2 λ4 λ5 x1 x2 x3 x4 x5 Classes can be indicated by item thresholds (categorical)\ item means (continuous) or factor mean and variance

  34. General Factor Mixture Model yik = Λkηik + εik, ηik = αk+ ζik, where, ζik~ N(0, Ψk) • Similar to the FA model, except many parameters can be class varying as indicated by the subscript k • Several variations of this model which differ in terms of the measurement invariance • Lubke & Neale (2005), Clark et al. (2012)

  35. FMM Profile Plot

  36. How do we do this in OpenMx? • You’ll have to wait till tomorrow! • Factor Mixture Model is a generalization of the Growth Mixture Model we’ll talk about tomorrow afternoon.

  37. Mixtures & Twin Models How do we combine the ACDE model and mixtures?

  38. Option 1: ACE Directly on the Classes 1.0 (MZ) / 0.5 (DZ) 1.0 cB CA CB cA eA eB aA aB x1B x1A x2B x2A x3B x3A x4A x4B

  39. What would this look like for the FMM? 1.0 (MZ) / 0.5 (DZ) 1.0 1.0 (MZ) / 0.5 (DZ) 1.0 aA aA CA CB cB eA cA aA aB eB cA cA eA eA FB FA x1B x1A x2B x2A x3B x3A x4A x4B

  40. FMM & ACE CONT’D • One of many possible ways to do FMM & ACE in the same model • Can also have class specific ACE on the factors • Each class has own heritability From Muthén et al. (2006)

  41. Issue with Option 1 • Model is utilizes the liability threshold model to “covert” the latent categorical variable, C, to a latent normal variable • This requires that classes are ordered • Ex: high, medium, low users • Don’t always have nicely ordered classes • Models are VERY time intensive • Take a vacation for a week or two

  42. Option 2: Three-Step Method • Estimate mixture model • Assign individuals into their most likely latent class based on the posterior probabilities of class membership • Use the observed, categorical variable of assigned class membership as the phenotype in a liability threshold model version of ACE analysis Note: Requires ordered classes

  43. Option 2a • Contingency table analysis using most likely class membership • Concordance between twins in terms of most likely class membership • If your classes are not ordered • Odds Ratio • Excess twin concordance due to stronger genetic relationship can be represented by the OR for MZ twins compared to the OR for DZ twins. • Place restrictions on the contingency table to test specific hypotheses • Mendelian segregation, only shared environmental effects • Eaves (1993)

  44. Issues with Option 2 • Potential for biased parameter estimates and underestimated standard errors • Assigned membership ignores fractional class membership suggested by posterior probabilities • Treat the classification as not having any sampling error • Good option when entropy is high\ well separated classes

  45. Selection of Cross-Sectional Mixture Genetic Analysis Writings • Latent Class Analysis • Eaves, 1993; Muthén et al., 2006; Clark, 2010 • Factor Mixture Analysis • Neale & Gillespie, 2005 (?); Clark, 2010; Clark et al. (in preparation) • Additional References • McLachlan, Do, & Ambroise, 2004 • Mixtures in Substance Abuse • Gillespie (2011, 2012) • Great cannabis examples

More Related