Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach

ENAR March 26, 2001 Charlotte, NC Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments of Oncology and Biostatistics

Motivation: We would like to investigate how many classes of depression exist using a latent class model applied to an epidemiologic sample of mental health symptoms. Issue: Latent class models require a “large” sample size to be estimable. Questions: Is our sample size large enough? How many classes can we fit “reliably” given the amount of data that we have?

Latent Class Model Overview(binary indicator case) • There are M classes of depression. m represents the proportion of individuals in the population in class m (m = 1,…,M). • Each individual is a member of one of the M classes, but we do not know which. The latent class of individual i is denoted by i (i 1,…M). • Symptom prevalences vary by class. The prevalence for symptom j in class m is denoted by pmj. • Given class membership, the symptoms are independent.

Likelihood Function For individual i,

Two Scenarios A latent class model with M classes may be only weakly estimable if (1) M classes are too many (over-parameterization). “True” number of classes is < M. (2) There are “truly” M classes, but there is not enough data to identify all of the classes. Better to use fewer classes than to use the M class model with weakly estimated parameters.

How large is large enough? Depends on true class sizes prevalence of items/symptoms measurement error in items/symptoms Example 1: Three class model N = 500. True class sizes are 30%, 40%, 30%. Symptom prevalences range from 30% to 80%. Example 2: Three class model N = 500. True class sizes are 80%, 18%, 2%. Symptom prevalences range from 3% to 10%. Estimation of LC Models Same sample size, but are both estimable?

Distinction and Definitions • Statistical identifiability refers to the “difficulty of distinguishing among two or more explanations of the same empirical phenomena.”1 • Not so much a concept that stems from limited data, but the ability to distinguish between explanations even if we had unlimited data. • Classic example: • X is random variable such that E(X) = 1 - 2 • observations of X allow us to identify 1 - 2 • but, we cannot identify 1 or 2regardless of the amount of information we collect on X. • Infinitely many values of 1 and 2 can give rise to the same value of X 1 Franklin Fisher, “Statistical Identifiability”, International Encyclopedia of Statistics, ed. Kruskal and Tanur, 1977.

“Identifiability” in Latent Class Models • Latent class issues with identifiability are “twofold”. • Still have the statistical identifiability to contend with. • Local identifiability issues have been well-defined • Goodman, 1974, Biometrika • McHugh, 1956, Biometrika • Bandeen-Roche et al., 1997, JASA • To distinguish issues stemming from limited data from classical identifiability we use alternative terminology: estimability. • We say that a model is estimable in this context if there is enough data to uniquely estimate all parameters with some degree of certainty.

Define p0to be a vector of parameters which defines the LC model. If, for all y and p in the neighborhood of p0, then we say that L is locally identifiable. Maximum likelihood estimation is concerned with local identifiability Local Identifiability NOT locally identified!

Weak Identifiability / Weak Estimability • Bayesian concept • Assume we partition parameter vector  = (1, 2). • If , then we say that 2 is not estimable. • The data (Y) provides no information about 2given 1. • If most of the information we have about 2 is supplied by its prior distribution then: • In words, if the prior distribution of 2 is approximately the same as its posterior, then we say that 2 is only weakly estimable or weakly identifiable.

Estimation Approaches Maximum Likelihood Approach: Find estimates of p, , and  that are most consistent with the data that we observe conditional on number of classes, M. Often used: EM algorithm (iterative fitting procedure) Bayesian Approach: Quantify beliefs about p, , and  before and after observing data. Often used: Gibbs sampler, MCMC algorithm

Bayesian Model Review • Every parameter, , in a Bayesian model has a prior distribution associated with it: • The resulting posterior distribution is a combination of the the prior distribution and the the likelihood function: If there is a lot of information in the data, then the likelihood will provide most of the information that determines the posterior. If there is not a lot of information in the data, then the prior will provide most of the information that determines the posterior

Estimability Assessment Approach Compare posterior distribution to the prior distribution • visual inspection: picture tells us how much information is coming from the prior and how much from the likelihood. • quantify similarity/difference by a statistic (e.g. percent overlap). Bayes factor is also related to this idea.

Simulated Data Examples

LCED for 2 class data

Simulated Data Examples

LCED for 3 class data

Latent Class Analysis in Mental Health Research

Epidemiologic Catchment Area (ECA) Study • Goal: To obtain epidemiologic sample on mental health data. • Population: Community dwelling individuals over age 18 in five sites in the US. • Instrument: Diagnostic Interview Schedule (DIS) includes 17 depression symptoms. • Our sample: • Data from 1981, Baltimore site only • Full information on depression symptoms as defined in the DSM-III on 2938 individuals.

Depression Symptom Groups

LCED for ECA data

Conclusions from ECA example • The four class model is not an appropriate model for this data. • This could be due to (1) Four class model is an overparameterization (2) More than three classes are needed to describe depression, but the data set is too small to estimate more classes.

Aside: Looking at 2 statistics May say just compare 2 statistics from a maximum likelihood estimation procedure BUT! • 2 relies on assumption that most patterns are relatively prevalent which is not generally true. • May see significant differences due to summing up very small differences over large number of samples. • M and M -1 class models are not really “nested” in interpretation • Not a valid statistic for comparing LC models. See Bayes Factor and BIC: these are better statistics for comparing LC models.

Extension: Quantifying Estimability • Calculate percent overlap between posterior and prior. • Estimate “estimabilility index” for each class where there are K symptoms: • Characteristics: • 0 <  < 1 • Larger numbers indicate weak estimability • Also see: Bayes Factor

Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach

Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach

Presentation Transcript

Modeling Acculturation Using Latent Class Analysis

Selectivity Estimation using Probabilistic Models

Latent Class Analysis Using Stata

Analyzing Survey Error with Latent Class Models

Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure

8. Heterogeneity: Latent Class Models

Successive Bayesian Estimation

Design of Individualized Dosage Regimes using a Bayesian Approach

A latent class approach to adolescent sexual behavior

Latent Tree Models

13. Latent Class Logit Models

Bayesian Approach

Modeling the unobservable developmental stability using a Bayesian latent variable model

Bayesian Models

Latent Factor Models

Modelling Charitable Donations: A Latent Class Panel Approach

A Discussion of the Bayesian Approach

A Bayesian Approach for Transformation Estimation

Bayesian Models

9. Heterogeneity: Latent Class Models

Spam Filtering Using Bayesian Approach

Using Latent Variable Models in Survey Research