670 likes | 830 Views
Novel Applications of Mixture Models to Social Science Data. Danielle Dean, M.A. Department of Psychology University of North Carolina, Chapel Hill Nisha Gottfredson, Ph.D. Transdisciplinary Prevention Research Center Duke University Modern Modeling Methods May 2012.
E N D
Novel Applications of Mixture Models to Social Science Data Danielle Dean, M.A. Department of Psychology University of North Carolina, Chapel Hill Nisha Gottfredson, Ph.D. Transdisciplinary Prevention Research Center Duke University Modern Modeling Methods May 2012
Introduction: Indirect Mixture Applications Presented by Nisha Gottfredson
Mixture Models Enable Analysts to Relax Parametric Assumptions • Marron and Wand (1992) showed that it is possible to replicate nearly any univariate distribution using a finite mixture of normal distributions • Variable means, variances, and proportions “Bimodal density” 2 normal distributions “Strongly skewed” 8 normal distributions
Multivariate Mixture Models • Each class g has a class-specific mean vector , covariance matrix , and mixing proportion • Examples include Growth Mixture Models (mixtures of latent curve models; Verbeke & Lesaffre, 1996; Muthén & Shedden, 1999) and Structural Equation Mixture Models (Jedidi, Jagpal, & Desarbo, 1997; Dolan & van der Maas, 1998)
Direct versus Indirect Applications • McLachlan & Peel (2000) distinguished between direct and indirect applications of finite mixture modeling • Direct applications more common in social sciences • Users believe that there is qualitative heterogeneity in the population • Interpret class-specific estimates • Aim is to recover true groupings • Indirect applications more common in statistics • Analysts uncomfortable with parametric assumptions • Aggregate over class-specific estimates • True groupings do not exist
Direct versus Indirect Interpretation is in the Eye of the Beholder • There is no empirical way to distinguish between groups as truth and groups as statistical convenience • AIC/BIC almost always suggest that classes improve model fit (Bauer & Curran, 2003; 2004; Bauer, 2007) • Non-normality of variables • Non-linearity • When in doubt, indirect interpretation is more robust than direct interpretation (Sterba et al., 2012)
Overview of Talks • Indirect applications of mixture models • Survival Mixture Models with study on multiple survival processes during transition to adulthood • Shared Parameter Mixture Models to handle non-randomly missing data in longitudinal studies
Survival Mixture Models for Simultaneously Capturing Multiple Survival Processes An Application with Data on Transitioning to Adulthood Presented by Danielle Dean
Multiple Survival Processes • How may we analyze multiple non-repeatable events which may occur at the same point in time for an individual? • E.g. age of onset of different drugs • E.g. age of transition to multiple roles
Survival Analysis • Survival or event history models • Multivariate survival analysis
Multiple Survival Processes • Non-repeatable events which may occur at the same point in time • Many researchers currently run a separate survival analysis for each event process but don’t analyze how the events are related
Univariate Survival Analysis • One non-repeatable event • Three main functions: survival, lifetime distribution, and hazard
Univariate Survival Analysis • E.g. therapy completion: • Model the hazard: • Compute the lifetime distribution and survival function
Multiple non-repeatable events • How are the events related? • Distribution of risk for multiple events is of unknown form • Model assumes the population is composed of a finite number of latent classes in order to parsimoniously describe the underlying distribution of risk (multiple event version of model presented by Muthén& Masyn, 2005)
Purpose of model • Parsimoniously describe the underlying distribution of risk for multiple events, without assuming a specific mathematical form for the distribution • Purpose is to draw attention to differences in the causes and consequences of different pathways (rather than to suggest the population is composed of literally distinct groups) • In spirit of indirect application, can compute model-implied functions weighting over latent classes to evaluate the effects of covariates • E.g. model implied risk for multiple events for males versus females, controlling for other covariates
Transitions to Adulthood • Life course theory • Order and timing of social roles • Meaning of a social role • E.g., “working parent” • Identify general structures of the life course
Methods • National Longitudinal Study of Adolescent Health • N = 15,701 • 4 events: Parenthood, Full-time work, Marriage, College • Ages 18-30
Model • Fit using Mplus 6.12 • Identify pathways through the life course (Model 1) • Influence of covariates on pathways (Model 2) • Gender, Race, Parental Education C X y1,18 – y1,30 . . . y4,18 – y4,30
Model Selection • Substantively redundant latent class in 6 class solution • Lifetime Distributions:
Order / Timing of Events • Median event time • Classes aggregate back to sample observed functions, average squared residual <0.001
Influence of Covariates • In spirit of indirect application, also possible to compute model-implied functions weighting over latent classes • Model-implied functions can be computed for different levels of a covariate • In this application, small number of categorical covariates allowed empirical comparisons between model-implied functions and sample observed functions
Gender • Model implies: • Women are more likely to be a parent across all ages • Women are more likely to be married across all ages • Women have a higher cumulative probability of obtaining a college degree
Race • Model implies: • Caucasians have a lower probability of parenthood at early ages, but similar levels of parenthood by age 30 • Caucasians have a higher cumulative probability of obtaining a college degree by age 30 • African-Americans have a lower cumulative probability of transitioning into a marriage role by age 30
Parental Education • Model implies: • Individuals with at least one parent with a college degree: • Much higher probability of obtaining a college degree • Much lower risk of starting full-time work at early ages • Lower cumulative probability of transitioning into marriage/parenthood by age 30
Residuals • Model implied functions can be compared to sample observed functions, stratified by different levels of a covariate
Residuals • Model implied functions can be compared to sample observed functions, stratified by different levels of a covariate • Average squared residual small for all three covariates (gender = 0.03, race = 0.04, parent education = 0.03)
Conclusions • In the application presented here, model appeared stable and to be reproducing the observed patterns well • Able to analyze multiple non-repeatable events • Able to include individuals with censored event times • Complexity of model has potential to increase our understanding of the risk for multiple events over time as well as possible mechanisms influencing this risk • Future work should extend the model to other common situations (e.g. repeatable events) and investigate the power of the model to detect the effects of covariates
Modeling Change in the Presence of Non-Randomly Missing Data An Application of Shared Parameter Mixture Models with Naturalistic Psychotherapy Data Presented by Nisha Gottfredson
Roadmap of Talk • Brief overview of traditional longitudinal models • Identify instances of potential non-random missingness in longitudinal studies • Define consequences of ignoring non-random missingness • Introduction to the shared parameter mixture model (SPMM) • Summary of Monte Carlo evaluation of SPMM performance • Analysis of psychotherapy data
Modeling Individual Change over Time using Latent Curve Models (LCM): A Linear Model with One Predictor
Traditional Longitudinal Models Require Missing at Random Assumption • Full information estimators (like FIML) use all available information from covariates and repeated measures to inform parameter estimates • Missing data patterns are ignored → Assume ignorable missingness (i.e., missing data are missing at random) • If the probability of missingness is unrelated to the repeated measures after conditioning on observed data, the missingness mechanism is missing at random (MAR) • If observed data cannot fully account for missingness, the missing data mechanism is missing not at random (MNAR) • There is no statistical test to determine whether missing data are MAR or MNAR
Non-ignorable Missing Data Mechanisms in Longitudinal Data Analysis • Latent curve model of growth/change over time: • In general, non-random missingness implies Outcome ( ) - Dependent Missingness (e.g., Little, 1995; Little, 2009) • Missingness depends on Xi, , and Random Coefficient Dependent Missingness
Some Examples of Non-Random Missingness in Longitudinal Studies • Older adults are less likely to report on cognitive outcomes due to dementia onset (Caused by random slope) • Adolescents who engage in externalizing behaviors are likely to be absent (suspended, expelled, skipping class) • Patients who respond most quickly to psychotherapy treatment leave earliest
Some Examples of Non-Random Missingness in Longitudinal Studies • Older adults are less likely to report on cognitive outcomes due to dementia onset or death • Adolescents who tend to engage in externalizing behaviors are likely to be absent (suspended, expelled, skipping class) (Caused by random intercept) • Patients who respond most quickly to psychotherapy treatment leave earliest
Some Examples of Non-Random Missingness in Longitudinal Studies • Older adults are less likely to report on cognitive outcomes due to dementia onset or death • Adolescents who engage in externalizing behaviors are likely to be absent (suspended, expelled, skipping class) • Patients who respond most quickly to psychotherapy treatment leave earliest (Caused by random slope)
Example of MAR Assumption with Psychotherapy Data LCM Requires MAR Assumption: Dropout is uncorrelated with individual differences in growth, after controlling for observed data
Individual Intercept and Rate of Change are Related to Dropout Occasion
Selection-Type Models are Commonly Used to Model Non-Random Missingness • In addition to longitudinal data model, users specify a model for the missing data process • Selection model is jointly estimated with the trajectory model (e.g., Heckman, 1979): • Trajectory estimates will be unbiased, assuming that the selection model is correct (i.e., no omitted covariates, correct functional form; Winship & Mare, 1992) • Selection models are notoriously sensitive to assumption violations • Theory is often insufficient to correctly specify a selection model
Shared Parameter Models • If non-random missingness is present, information about the missing data must be incorporated into the model • The relationship between missing data patterns and the growth trajectory can be explained by a latent ‘shared parameter’ (e.g., Vonesh et al., 2006) • Conditioning on the shared parameter leads to conditional independence between the trajectory estimates and the missing data indicators: • But it is still possible to misspecify the missing data process using a continuous latent variable as the shared parameter
Shared Parameter Mixture Models: Approximating the Missingness Mechanism • Nearly any distribution of unknown form can be approximated using mixture distributions • Replacing latent continua with K latent classes allows us to semi-parametrically approximate the relationship between the trajectory and the missing data • Extension of observed pattern mixture (Little, 1993, 1995; Hedeker & Gibbons, 1997) • Estimating enough latent classes should account for nearly any form of dependence between the missing data and the growth parameters • Roy, 2003; Lin et al., 2004; Morgan-Lopez & Fals-Stewart, 2008; Tsonaka et al., 2009; Muthen et al., 2011; Gottfredson et al., 2012
Shared Parameter Mixture Models with a Full Measurement Model • Assume independence between repeated measures and missing data indicators conditional on latent class • Estimate as many classes as necessary to obtain independence • Rely on AIC or BIC to determine when to stop adding latent classes
Summary Indicators are More Efficient • Summarize missing data patterns with a one-number summary (e.g., number of missing observations) • Avoid estimating a model with many binary outcomes • A good summary indicator will include the same information more succinctly • This model is used for simulation study and empirical example
Aggregating Over Latent Classes Incorporates Informative Missingness into Trajectory Estimates • The model will estimate a separate mean growth trajectory for K latent classes • Weight within-class growth factor means by estimated class proportions (Bauer, 2005):
Judging Appropriateness of SPMM to Real-World Data • Can the model accommodate most of the potential missing data mechanisms that are present? • Simulation studies test model performance (i.e., bias and precision of parameter estimates) under a variety of data conditions • Hypotheses: • The model should perform better than the LCM when a variety of random coefficient dependent missing data mechanisms are present, but not necessarily with outcome dependent missingness • Both models should accommodate MAR mechanisms but LCM might be more efficient • The summary SPMM should perform as well as the full binary SPMM under most conditions