Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study

Graphical Diagnostic Tools for Evaluating Latent Class Models:An Application to Depression in the ECA Study Elizabeth S. Garrett Department of Biostatistics Johns Hopkins University

GOAL • Provide tools for choosing the most appropriate latent class model. • Interpret objective diagnostic methods in reference to the latent class model.

Table of Contents • Introduction • Previous Work • Model Estimation • Diagnostic Methods for Latent Class Models • Extensions to Latent Class Regression • Application to the ECA Study • Validating Diagnostic Criteria for Depression Using LCM • Discussion and Further Research

Outline • Depression in relation to the LCM • Approach to Estimation • The ECA Study • Predicted Frequency Check Plot • Latent Class Estimability Display • Interpretation of Findings • Revisions

Motivating Question How should we describe “major depression?” • not depressed, depressed • none, moderate, severe • none, mild, moderate, severe • none, mood symptoms, somatic symptoms, both

How we conceptualize “major depression” • We use indicators of symptoms such as self-reported presence of sadness, weight change, etc. • A combination of these indicators is thought to define depression. • Using these combinations, we commonly seek to categorize individuals into depression classes. • These classes represent the construct “depression.” • “Depression” is a latent variable. • The construct of “Depression” can then be used for classification, description, and prediction

Depression in the Diagnostic and Statistical Manual of Mental Disorders, 3rd Edition DSM-III Criteria (generally): A. Dysphoria for 2 or more weeks B. Reported symptoms in 4 or more of the following symptom groups: 1. loss of appetite, weight change 2. insomnia, hypersomnia 3. retarded movement, restlessness 4. disinterest in sex 5. fatigue 6. feelings of guilt or worthlessness 7. trouble concentrating, thoughts slow or mixed 8. morbid thoughts, suicidal thoughts/attempts

Latent Class Model: Main Ideas • There are M classes of depression (e.g. none, mild, severe). m represents the proportion of individuals in the population in class m (m=1,…,M) • Each person is a member of one of the M classes, but we do not know which. The latent class of individual i is denoted by i. • Symptom prevalences vary by class. The prevalence for symptom j in class m is denoted by pmj. • Given class membership, the symptoms are independent.

Latent Class Model • M : number of classes • pi: vector of symptom probabilities given latent class i • : probability of being in latent class m, m=1,…M. • : the true latent class of individual i. • : vector of individual i’s report of symptoms.

Estimation Approach Bayesian Approach: Quantify beliefs about p, , and  before and after observing data. Bayesian Terminology: Prior Probability: What we believe about unknown parameters before observing data. Posterior Probability: What we believe about the parameters after observing data.

Bayesian Estimation Approach We estimated the models using a Markov chain Monte Carlo (MCMC) algorithm: Specify prior probability distribution: P(p, , ) Combine prior with likelihood to obtain posterior distribution: P(p, ,  |Y) P(p, , ) x L(Y| p, , ) Estimate posterior distribution for each parameter using iterative procedure. P(1|Y) = ∫P(p, ,  |Y)

The Epidemiologic Catchment Area Study 3481 community-dwelling individuals in Baltimore were interviewed using the NIMH Diagnostic Interview Schedule. 8 self-reported symptom groups were completed for 2938 individuals*. 6 month prevalence of symptoms was assessed. * those with organic brain disorder were omitted as per DSM-III criterion

The Epidemiologic Catchment Area Study 

Predicted Frequency Check (PFC) Plot Compare observed symptom pattern frequencies to what the model predicts for a new sample of data from the same population. Symptom patterns: • 000000000 no reported symptoms • 000000001 report dysphoria only • 111111111 report all symptoms 29 = 512 possible patterns

Example: Pattern 001000001 : • restlessness/retarded movement • dysphoria We observed 24 individuals with this symptom pattern:

Example: 95% confidence interval for frequency? Non-parametric (saturated model) estimate:

Model Based Estimation Predicted frequency of pattern 001000001 and prediction interval in the 3 class model: 97.5% 2.5% (x)

Model Based Estimation Comparison of model based prediction interval to empirical confidence interval: 97.5% Observed 2.5%

Predicted Frequency Check Plot

Latent Class Estimability Display (LCED) Is there enough data to estimate all of the parameters in the model? • 2 class model: 19 parameters • 3 class model: 29 parameters • 4 class model: 39 parameters Problems arise when: • small data set • small class size e.g. N=1000 and class size = 0.01 10 individuals in class to estimate symptom prevalences • small data set and small class size

Weak “Identifiability”(Weak Estimability) Definition: A parameter in a (Bayesian) model is weakly identified if the posterior distribution of the parameter is approximately the same as the prior. P(1)  P(1|Y) If a model is weakly identified it is still “valid”, but we cannot make inferences from the data about the weakly identified parameters.

Examples

Latent Class Estimability Display

Depression appears to be ‘dimensional’ none mild severe 2% of population is in severe class 14% in mild class: are they depressed or not? How does this compare to the DSM-III definition? Interpretation 

Work Not Included in Talk • MCMC Algorithm • Log Odds Ratio Check Plot • Predicted Class Assignment Display • Extensions to Regression

Revisions Already Implemented • New example for Chapter 5 (LCRR) • Background/justification of latent class model as “gold-standard” in validation • Splus programs: on website with a “user’s guide”

Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study

Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study

Presentation Transcript

Graphical Models

Christopher M. Bishop

Analyzing Survey Error with Latent Class Models

Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach

Graphical models for combining multiple data sources

Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure

Discrete Choice Modeling

Representation and Reasoning with Graphical Models

Latent Tree Models Part IV: Applications

13. Latent Class Logit Models

Max-Margin Latent Variable Models

Discrete Choice Modeling

Graphical models for combining multiple data sources

Evaluating The Validity of Models

Probabilistic graphical models

Graphical Models

Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure

Extending Expectation Propagation on Graphical Models

Extending Expectation Propagation on Graphical Models

Search-based Learning of Latent Tree Models