170 likes | 312 Views
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure. Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University esg@jhu.edu. Overview. Latent class models can be useful tools for measuring latent constructs.
E N D
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University esg@jhu.edu
Overview • Latent class models can be useful tools for measuring latent constructs. • Latent class model checking is somewhat complicated because we cannot “check” model fit using standard approaches which rely on comparing fitted values to observed. • After fitting a latent class regression model, what can we do to see if we obey several key assumptions? • Conditional independence? • Non-differential measurement?
N=1126 in 1993 in Baltimore Symptoms (DSM-IV): dysphoria weight/appetite change sleep problems slow/increased movement loss of interest/pleasure fatigue guilt concentration problems thoughts of death Covariates of interest gender age marital status education income How are education and income associated with depression? From standard LC model fit: The symptoms listed at left define depression Depression is a latent class variable with 3 classes Classes are “ordered”: None Mild Severe What is the association between depression and socio-economic status?Epidemiologic Catchment Area (ECA) Study
There are J classes of individuals. pj represents the proportion of individuals in the population in class j (j=1,…,J) Each person is a member of one of the J classes, but we do not know which. The latent class of individual i is denoted by ci. Symptom prevalences vary by class. The prevalence for symptom m in class j is denoted by mj. We assume that covariates, x, are associated with class membership Given class membership, the symptoms are independent of each other Given class membership, the symptoms are independent of covariates Latent Class Regression Model: Main Ideas CONDITIONAL INDEPENDENCE NON-DIFFERENTIAL MEASUREMENT
Latent Class Regression Likelihood Assumptions where • Conditional Independence: • given an individual’s depression class, his/her symptoms are independent • P(yig, yih | ci) = P(yig | ci) P(yih | ci) • Non-differential Measurement: • given an individual’s depression class, covariates are not associated with symptoms • P(yig | xi, ci) = P(yig | ci)
Depression Example:LCR coefficients (log ORs) * indicates significant at the 0.10 level Note: class 1 is non-depressed,class 2 is mild, class 3 is severe
Checking Conditional Independence Assumption • For each pair of symptoms (h and g), in each class (j), consider • If assumption holds, this OR will be approximately equal to 1. • (The log OR will be approximately equal to 0). • Why may this get tricky? • We don’t KNOW class assignments. • Need a strategy for assigning individuals to classes. Checking Non-differential Measurement Assumption • For each symptom (h), covariate (x), and class (j) combination, we can • estimate an odds ratio. Example in the binary covariate case:
Model Estimation: Markov Chain Monte Carlo procedure • Bayesian Approach • Quantify beliefs about p, , and c before and after observing data. • Prior Probability: What we believe about unknown parameters before observing data. • Posterior Probability: What we believe about the parameters after observing data. • Model specifications: • Specify prior probability distribution: P(p, , c) • Combine prior with likelihood to obtain posterior distribution: P(p, , c|Y) P(p, , c) x L(Y| p, , c) • Estimate posterior distribution for each parameter using iterative procedure. P(p1|Y) = P(p, , |Y)
Bayesian Estimation Approach The Gibbs Sampler is an iterative process used to estimate posterior distributions of parameters. • we sample parameters from conditional distributions e.g. P(p1|Y, p, c, ) • At each iteration, we get ‘sampled’ values of p, , and c. • We use the samples from the iterations to estimate posterior distributions by averaging over other parameter values. This is a key feature for these methods!
Checking Assumptions: MCMC (Bayesian) approach • At each iteration in the Gibbs sampler, individuals are automatically assigned to classes no need to “manually” assign. • At each iteration, simply calculate the log OR’s of interest. • Then, “marginalize” or average over all iterations. • Result is posterior distribution of log OR • From posterior distribution, we have both a point estimate and precisionestimate of the log OR. • We can calculate “posterior intervals” (similar to confidence intervals) to see if there is evidence of violation of assumptions.
Implementation • “Canned” implementation: • BUGS (unix and linux) • WinBugs (windows) • http://astor.som.jhmi.edu/~esg/software • Scripts can be (have been) written in • R, Splus • SAS
Checking Assumptions: Maximum Likelihood Approach Using ML approach, we can get a result that will likely be quite similar • (a) assign individuals to “pseudo-classes” based on posterior probability of class membership • (b) calculate OR’s within classes. • (c) repeat (a) and (b) at least a few times • (d) compare OR’s to 1. Drawback: • In ML, additional post hoc computations are necessary. • Don’t get precision estimates as you do in MCMC approach. • MCMC approach is designed for computing posterior distribution of functions of parameters.