1 / 66

Novel Applications of Mixture Models to Social Science Data

Novel Applications of Mixture Models to Social Science Data. Danielle Dean, M.A. Department of Psychology University of North Carolina, Chapel Hill Nisha Gottfredson, Ph.D. Transdisciplinary Prevention Research Center Duke University Modern Modeling Methods May 2012.

oliana
Download Presentation

Novel Applications of Mixture Models to Social Science Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Novel Applications of Mixture Models to Social Science Data Danielle Dean, M.A. Department of Psychology University of North Carolina, Chapel Hill Nisha Gottfredson, Ph.D. Transdisciplinary Prevention Research Center Duke University Modern Modeling Methods May 2012

  2. Introduction: Indirect Mixture Applications Presented by Nisha Gottfredson

  3. Mixture Models Enable Analysts to Relax Parametric Assumptions • Marron and Wand (1992) showed that it is possible to replicate nearly any univariate distribution using a finite mixture of normal distributions • Variable means, variances, and proportions “Bimodal density” 2 normal distributions “Strongly skewed” 8 normal distributions

  4. Multivariate Mixture Models • Each class g has a class-specific mean vector , covariance matrix , and mixing proportion • Examples include Growth Mixture Models (mixtures of latent curve models; Verbeke & Lesaffre, 1996; Muthén & Shedden, 1999) and Structural Equation Mixture Models (Jedidi, Jagpal, & Desarbo, 1997; Dolan & van der Maas, 1998)

  5. Direct versus Indirect Applications • McLachlan & Peel (2000) distinguished between direct and indirect applications of finite mixture modeling • Direct applications more common in social sciences • Users believe that there is qualitative heterogeneity in the population • Interpret class-specific estimates • Aim is to recover true groupings • Indirect applications more common in statistics • Analysts uncomfortable with parametric assumptions • Aggregate over class-specific estimates • True groupings do not exist

  6. Direct versus Indirect Interpretation is in the Eye of the Beholder • There is no empirical way to distinguish between groups as truth and groups as statistical convenience • AIC/BIC almost always suggest that classes improve model fit (Bauer & Curran, 2003; 2004; Bauer, 2007) • Non-normality of variables • Non-linearity • When in doubt, indirect interpretation is more robust than direct interpretation (Sterba et al., 2012)

  7. Overview of Talks • Indirect applications of mixture models • Survival Mixture Models with study on multiple survival processes during transition to adulthood • Shared Parameter Mixture Models to handle non-randomly missing data in longitudinal studies

  8. Survival Mixture Models for Simultaneously Capturing Multiple Survival Processes An Application with Data on Transitioning to Adulthood Presented by Danielle Dean

  9. Multiple Survival Processes • How may we analyze multiple non-repeatable events which may occur at the same point in time for an individual? • E.g. age of onset of different drugs • E.g. age of transition to multiple roles

  10. Survival Analysis • Survival or event history models • Multivariate survival analysis

  11. Multiple Survival Processes • Non-repeatable events which may occur at the same point in time • Many researchers currently run a separate survival analysis for each event process but don’t analyze how the events are related

  12. Univariate Survival Analysis • One non-repeatable event • Three main functions: survival, lifetime distribution, and hazard

  13. Univariate Survival Analysis • E.g. therapy completion: • Model the hazard: • Compute the lifetime distribution and survival function

  14. Multiple non-repeatable events • How are the events related? • Distribution of risk for multiple events is of unknown form • Model assumes the population is composed of a finite number of latent classes in order to parsimoniously describe the underlying distribution of risk (multiple event version of model presented by Muthén& Masyn, 2005)

  15. Purpose of model • Parsimoniously describe the underlying distribution of risk for multiple events, without assuming a specific mathematical form for the distribution • Purpose is to draw attention to differences in the causes and consequences of different pathways (rather than to suggest the population is composed of literally distinct groups) • In spirit of indirect application, can compute model-implied functions weighting over latent classes to evaluate the effects of covariates • E.g. model implied risk for multiple events for males versus females, controlling for other covariates

  16. Transitions to Adulthood • Life course theory • Order and timing of social roles • Meaning of a social role • E.g., “working parent” • Identify general structures of the life course

  17. Methods • National Longitudinal Study of Adolescent Health • N = 15,701 • 4 events: Parenthood, Full-time work, Marriage, College • Ages 18-30

  18. Model • Fit using Mplus 6.12 • Identify pathways through the life course (Model 1) • Influence of covariates on pathways (Model 2) • Gender, Race, Parental Education C X y1,18 – y1,30 . . . y4,18 – y4,30

  19. Model Selection

  20. Model Selection • Substantively redundant latent class in 6 class solution • Lifetime Distributions:

  21. 5 class solution (hazards)

  22. 5 class solution (lifetime dist.)

  23. Order / Timing of Events • Median event time • Classes aggregate back to sample observed functions, average squared residual <0.001

  24. Influence of Covariates

  25. Influence of Covariates • In spirit of indirect application, also possible to compute model-implied functions weighting over latent classes • Model-implied functions can be computed for different levels of a covariate • In this application, small number of categorical covariates allowed empirical comparisons between model-implied functions and sample observed functions

  26. Gender • Model implies: • Women are more likely to be a parent across all ages • Women are more likely to be married across all ages • Women have a higher cumulative probability of obtaining a college degree

  27. Race • Model implies: • Caucasians have a lower probability of parenthood at early ages, but similar levels of parenthood by age 30 • Caucasians have a higher cumulative probability of obtaining a college degree by age 30 • African-Americans have a lower cumulative probability of transitioning into a marriage role by age 30

  28. Parental Education • Model implies: • Individuals with at least one parent with a college degree: • Much higher probability of obtaining a college degree • Much lower risk of starting full-time work at early ages • Lower cumulative probability of transitioning into marriage/parenthood by age 30

  29. Residuals • Model implied functions can be compared to sample observed functions, stratified by different levels of a covariate

  30. Residuals • Model implied functions can be compared to sample observed functions, stratified by different levels of a covariate • Average squared residual small for all three covariates (gender = 0.03, race = 0.04, parent education = 0.03)

  31. Conclusions • In the application presented here, model appeared stable and to be reproducing the observed patterns well • Able to analyze multiple non-repeatable events • Able to include individuals with censored event times • Complexity of model has potential to increase our understanding of the risk for multiple events over time as well as possible mechanisms influencing this risk • Future work should extend the model to other common situations (e.g. repeatable events) and investigate the power of the model to detect the effects of covariates

  32. Modeling Change in the Presence of Non-Randomly Missing Data An Application of Shared Parameter Mixture Models with Naturalistic Psychotherapy Data Presented by Nisha Gottfredson

  33. Roadmap of Talk • Brief overview of traditional longitudinal models • Identify instances of potential non-random missingness in longitudinal studies • Define consequences of ignoring non-random missingness • Introduction to the shared parameter mixture model (SPMM) • Summary of Monte Carlo evaluation of SPMM performance • Analysis of psychotherapy data

  34. Modeling Individual Change over Time using Latent Curve Models (LCM): A Linear Model with One Predictor

  35. Traditional Longitudinal Models Require Missing at Random Assumption • Full information estimators (like FIML) use all available information from covariates and repeated measures to inform parameter estimates • Missing data patterns are ignored → Assume ignorable missingness (i.e., missing data are missing at random) • If the probability of missingness is unrelated to the repeated measures after conditioning on observed data, the missingness mechanism is missing at random (MAR) • If observed data cannot fully account for missingness, the missing data mechanism is missing not at random (MNAR) • There is no statistical test to determine whether missing data are MAR or MNAR

  36. Non-ignorable Missing Data Mechanisms in Longitudinal Data Analysis • Latent curve model of growth/change over time: • In general, non-random missingness implies Outcome ( ) - Dependent Missingness (e.g., Little, 1995; Little, 2009) • Missingness depends on Xi, , and Random Coefficient Dependent Missingness

  37. Some Examples of Non-Random Missingness in Longitudinal Studies • Older adults are less likely to report on cognitive outcomes due to dementia onset (Caused by random slope) • Adolescents who engage in externalizing behaviors are likely to be absent (suspended, expelled, skipping class) • Patients who respond most quickly to psychotherapy treatment leave earliest

  38. Some Examples of Non-Random Missingness in Longitudinal Studies • Older adults are less likely to report on cognitive outcomes due to dementia onset or death • Adolescents who tend to engage in externalizing behaviors are likely to be absent (suspended, expelled, skipping class) (Caused by random intercept) • Patients who respond most quickly to psychotherapy treatment leave earliest

  39. Some Examples of Non-Random Missingness in Longitudinal Studies • Older adults are less likely to report on cognitive outcomes due to dementia onset or death • Adolescents who engage in externalizing behaviors are likely to be absent (suspended, expelled, skipping class) • Patients who respond most quickly to psychotherapy treatment leave earliest (Caused by random slope)

  40. Ignoring Non-Random Missingness Has Severe Implications

  41. Example of MAR Assumption with Psychotherapy Data LCM Requires MAR Assumption: Dropout is uncorrelated with individual differences in growth, after controlling for observed data

  42. Individual Intercept and Rate of Change are Related to Dropout Occasion

  43. Selection-Type Models are Commonly Used to Model Non-Random Missingness • In addition to longitudinal data model, users specify a model for the missing data process • Selection model is jointly estimated with the trajectory model (e.g., Heckman, 1979): • Trajectory estimates will be unbiased, assuming that the selection model is correct (i.e., no omitted covariates, correct functional form; Winship & Mare, 1992) • Selection models are notoriously sensitive to assumption violations • Theory is often insufficient to correctly specify a selection model

  44. Shared Parameter Models • If non-random missingness is present, information about the missing data must be incorporated into the model • The relationship between missing data patterns and the growth trajectory can be explained by a latent ‘shared parameter’ (e.g., Vonesh et al., 2006) • Conditioning on the shared parameter leads to conditional independence between the trajectory estimates and the missing data indicators: • But it is still possible to misspecify the missing data process using a continuous latent variable as the shared parameter

  45. Shared Parameter Mixture Models: Approximating the Missingness Mechanism • Nearly any distribution of unknown form can be approximated using mixture distributions • Replacing latent continua with K latent classes allows us to semi-parametrically approximate the relationship between the trajectory and the missing data • Extension of observed pattern mixture (Little, 1993, 1995; Hedeker & Gibbons, 1997) • Estimating enough latent classes should account for nearly any form of dependence between the missing data and the growth parameters • Roy, 2003; Lin et al., 2004; Morgan-Lopez & Fals-Stewart, 2008; Tsonaka et al., 2009; Muthen et al., 2011; Gottfredson et al., 2012

  46. Shared Parameter Mixture Models with a Full Measurement Model • Assume independence between repeated measures and missing data indicators conditional on latent class • Estimate as many classes as necessary to obtain independence • Rely on AIC or BIC to determine when to stop adding latent classes

  47. Summary Indicators are More Efficient • Summarize missing data patterns with a one-number summary (e.g., number of missing observations) • Avoid estimating a model with many binary outcomes • A good summary indicator will include the same information more succinctly • This model is used for simulation study and empirical example

  48. Aggregating Over Latent Classes Incorporates Informative Missingness into Trajectory Estimates • The model will estimate a separate mean growth trajectory for K latent classes • Weight within-class growth factor means by estimated class proportions (Bauer, 2005):

  49. Judging Appropriateness of SPMM to Real-World Data • Can the model accommodate most of the potential missing data mechanisms that are present? • Simulation studies test model performance (i.e., bias and precision of parameter estimates) under a variety of data conditions • Hypotheses: • The model should perform better than the LCM when a variety of random coefficient dependent missing data mechanisms are present, but not necessarily with outcome dependent missingness • Both models should accommodate MAR mechanisms but LCM might be more efficient • The summary SPMM should perform as well as the full binary SPMM under most conditions

More Related