Mixture Modeling

Mixture Modeling Chongming Yang Research Support Center FHSS College

Mixture of Distributions

Classification Techniques • Latent Class Analysis (categorical indicators) • Latent Profile Analysis (continuous Indicators) • Finite Mixture Modeling (multivariate normal variables) • …

Integrate Classification Models into Other Models • Mixture Factor Analysis • Mixture Regressions • Mixture Structural Equation Modeling • Growth Mixture Modeling • Multilevel Mixture Modeling

Disadvantages of Multi-steps Practice • Multistep practice • Run classification model • Save membership Variable • Model membership variable and other variables • Disadvantages • Biases in parameter estimates • Biases in standard errors • Significance • Confidence Intervals

Latent Class Analysis (LCA) • Setting • Latent trait assumed to be categorical • Trait measured with multiple categorical indicators • Example: drug addiction, Schizophrenia • Aim • Identify heterogeneous classes/groups • Estimate class probabilities • Identify good indicators of classes • Relate covariates to Classes

Graphic LCA Model • Categorical Indicators u: u1, u2,u3, …ur • Categorical Latent Variable C: C =1, 2, …, or K

Probabilistic Model • Assumption: Conditional independence of u so thatinterdependence is explained by C like factor analysis model • An item probability • Joint Probability of all indicators

LCA Parameters • Number of Classes -1 • Item Probabilities -1

Class Means (Logit) • Probability Scale (logistic Regression without any Covariates x) • Logit Scale • Mean (highest number of Class) = 0

Latent Class Analysis with Covariates • Covariates are related to Class Probability with multinomial logistic regression

Posterior Probability(membership/classification of cases)

Estimation • Maximum Likelihood estimation via • Expectation-Maximization algorithm • E (expectation) step: compute average posterior probabilities for each class and item • M (maximization) step: estimate class and item parameters • Iterate EM to maximize the likelihood of the parameters

Test against Data • O = observed number of response patterns • E = model estimated number of response patterns • Pearson • Chi-square based on likelihood ratio

Determine Number of Classes • Substantive theory (parsimonious, interpretable) • Predictive validity • Auxiliary variables / covariates • Statistical information and tests • Bayesian Information Criterion (BIC) • Entropy • Testing K against K-1 Classes • Vuong-Lo-Mendell-Rubin likelihood-ratio test • Bootstrapped likelihood ratio test

Bayesian Information Criterion (BIC) L = likelihood h = number of parameters N = sample size Choose model with smallest BIC BIC Difference > 4 appreciable

Quality of Classification • Entropy • = average of highest class probability of individuals • A value of close to 1 indicates good classification • No clear cutting point for acceptance or rejection

Testing K against K-1 Classes • Bootstrapped likelihood ratio test LRT = 2[logL(model 1)- logL(model2)], where model 2 is nested in model 1. Bootstrap Steps: • Estimate LRT for both models • Use bootstrapped samples to obtain distributions for LRT of both models • Compare LRT and get p values

Testing K against K-1 Classes • Vuong-Lo-Mendell-Rubin likelihood-ratio test

Determine Quality of Indicators • Good indicators • Item response probability is close to 0 or 1 in each class • Bad indicators • Item response probability is high in more than one classes, like cross-loading in factor analysis • Item response probability is lowin all classes like low-loading in factor analysis

LCA Examples • LCA • LCA with covariates • Class predicts a categorical outcome

Save Membership Variable Variable: idvar = id; Output: Savedata: File = cmmber.txt; Save = cprob;

Latent Profile Analysis • Covariance of continuous variables are dependent on class K and fixed at zero • Variances of continuous variables are constrained to be equal across classes and minimized • Mean differences are maximized across classes

Finite Mixture Modeling(multivariate normal variables) • Finite = finite number of subgroups/classes • Variables are normally distributed in each class • Means differ across classes • Variances are the same across • Covariances can differ without restrictions or equal with restrictions across classes • Latent profile can be special case with covariances fixed at zero.

Mixture Factor Analysis • Allow one to examine measurement properties of items in heterogeneous subgroups / classes • Measurement invariance is not required assuming heterogeneity • Factor structure can change • See Mplusoutputs

Factor Mixture Analysis • Parental Control • Parental Acceptance

Two dimensions of Parenting

Mixture SEM • See mixture growth modeling

Mixture Modeling with Known Classes • Identify hidden classes within known groups • Under nonrandomized experiments • Impose equality constraints on covariates to identify similar classes from known groups • Compare classes that differ in covariates

Mixture Modeling