340 likes | 569 Views
Mixture Modeling. Chongming Yang Research Support Center FHSS College. Mixture of Distributions. Mixture of Distributions. Classification Techniques. Latent Class Analysis (categorical indicators) Latent Profile Analysis (continuous Indicators)
E N D
Mixture Modeling Chongming Yang Research Support Center FHSS College
Classification Techniques • Latent Class Analysis (categorical indicators) • Latent Profile Analysis (continuous Indicators) • Finite Mixture Modeling (multivariate normal variables) • …
Integrate Classification Models into Other Models • Mixture Factor Analysis • Mixture Regressions • Mixture Structural Equation Modeling • Growth Mixture Modeling • Multilevel Mixture Modeling
Disadvantages of Multi-steps Practice • Multistep practice • Run classification model • Save membership Variable • Model membership variable and other variables • Disadvantages • Biases in parameter estimates • Biases in standard errors • Significance • Confidence Intervals
Latent Class Analysis (LCA) • Setting • Latent trait assumed to be categorical • Trait measured with multiple categorical indicators • Example: drug addiction, Schizophrenia • Aim • Identify heterogeneous classes/groups • Estimate class probabilities • Identify good indicators of classes • Relate covariates to Classes
Graphic LCA Model • Categorical Indicators u: u1, u2,u3, …ur • Categorical Latent Variable C: C =1, 2, …, or K
Probabilistic Model • Assumption: Conditional independence of u so thatinterdependence is explained by C like factor analysis model • An item probability • Joint Probability of all indicators
LCA Parameters • Number of Classes -1 • Item Probabilities -1
Class Means (Logit) • Probability Scale (logistic Regression without any Covariates x) • Logit Scale • Mean (highest number of Class) = 0
Latent Class Analysis with Covariates • Covariates are related to Class Probability with multinomial logistic regression
Estimation • Maximum Likelihood estimation via • Expectation-Maximization algorithm • E (expectation) step: compute average posterior probabilities for each class and item • M (maximization) step: estimate class and item parameters • Iterate EM to maximize the likelihood of the parameters
Test against Data • O = observed number of response patterns • E = model estimated number of response patterns • Pearson • Chi-square based on likelihood ratio
Determine Number of Classes • Substantive theory (parsimonious, interpretable) • Predictive validity • Auxiliary variables / covariates • Statistical information and tests • Bayesian Information Criterion (BIC) • Entropy • Testing K against K-1 Classes • Vuong-Lo-Mendell-Rubin likelihood-ratio test • Bootstrapped likelihood ratio test
Bayesian Information Criterion (BIC) L = likelihood h = number of parameters N = sample size Choose model with smallest BIC BIC Difference > 4 appreciable
Quality of Classification • Entropy • = average of highest class probability of individuals • A value of close to 1 indicates good classification • No clear cutting point for acceptance or rejection
Testing K against K-1 Classes • Bootstrapped likelihood ratio test LRT = 2[logL(model 1)- logL(model2)], where model 2 is nested in model 1. Bootstrap Steps: • Estimate LRT for both models • Use bootstrapped samples to obtain distributions for LRT of both models • Compare LRT and get p values
Testing K against K-1 Classes • Vuong-Lo-Mendell-Rubin likelihood-ratio test
Determine Quality of Indicators • Good indicators • Item response probability is close to 0 or 1 in each class • Bad indicators • Item response probability is high in more than one classes, like cross-loading in factor analysis • Item response probability is lowin all classes like low-loading in factor analysis
LCA Examples • LCA • LCA with covariates • Class predicts a categorical outcome
Save Membership Variable Variable: idvar = id; Output: Savedata: File = cmmber.txt; Save = cprob;
Latent Profile Analysis • Covariance of continuous variables are dependent on class K and fixed at zero • Variances of continuous variables are constrained to be equal across classes and minimized • Mean differences are maximized across classes
Finite Mixture Modeling(multivariate normal variables) • Finite = finite number of subgroups/classes • Variables are normally distributed in each class • Means differ across classes • Variances are the same across • Covariances can differ without restrictions or equal with restrictions across classes • Latent profile can be special case with covariances fixed at zero.
Mixture Factor Analysis • Allow one to examine measurement properties of items in heterogeneous subgroups / classes • Measurement invariance is not required assuming heterogeneity • Factor structure can change • See Mplusoutputs
Factor Mixture Analysis • Parental Control • Parental Acceptance
Mixture SEM • See mixture growth modeling
Mixture Modeling with Known Classes • Identify hidden classes within known groups • Under nonrandomized experiments • Impose equality constraints on covariates to identify similar classes from known groups • Compare classes that differ in covariates