1 / 52

Latent Class Analysis in SAS  : Promise, Problems, and Programming

This paper explores the use of Latent Class Analysis (LCA) in classifying patients in clinical decision-making, discussing its benefits, limitations, and the programming approaches in SAS. It also discusses strategies for estimating latent class parameters and producing standard errors.

ecordero
Download Presentation

Latent Class Analysis in SAS  : Promise, Problems, and Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Latent Class Analysis in SAS:Promise, Problems, and Programming David M. Thompson Department of Biostatistics and Epidemiology College of Public Health, OUHSC

  2. Latent class analysis (LCA) • LCA validates classification in the absence of a gold standard for decision-making. • Incorporation into SAS is recent. Invited paper 192-2007

  3. LCA and Patient Classification Patient classification is part of many clinical decisions. • Diagnosis • Prognosis Invited paper 192-2007

  4. Patient classification in the absence of a gold standard Diagnosis • Diagnostic categories may be emerging or unclear. Prognosis • predicting rehabilitation outcomes • counseling patients and families regarding expectations Invited paper 192-2007

  5. Outline • LCA defined • SAS approaches to LCA • Producing standard errors • Curing the problem of fracturing of estimates • Limitations of LCA Invited paper 192-2007

  6. Latent class analysis (LCA) • LCA is a parallel to factor analysis, but for categorical responses. • Like factor analysis, LCA addresses the complex pattern of association that appears among observations…. Invited paper 192-2007

  7. …and attributes the pattern to a set of latent (underlying, unobserved) factors or classes. Invited paper 192-2007

  8. A complex pattern of responses emerged when undergraduates made ethical decisions in response to four stimulus scenarios Stouffer, S.A., & Toby, J. (1951). Role conflict and personality. American Journal of Sociology, 56, 395-406. Invited paper 192-2007

  9. LCA predicts latent class membership such that observed responses are independent. Invited paper 192-2007

  10. LCA estimatesLatent class prevalencesConditional probabilities: probabilities of a specific response, given class membership P(A-P acc | LC 1) P(St.Mkt.Info | LC 2) Invited paper 192-2007

  11. Conditional probabilities are analogous to sensitivities and specificities, but are calculated in the absence of a gold standard. P(A-P acc | LC 1) P(St.Mkt.Info | LC 2) Invited paper 192-2007

  12. LC parameter estimates for Stouffer and Toby data Invited paper 192-2007

  13. Indicators’ informativeness defined by differences in conditional probabilities Invited paper 192-2007

  14. LCA works on unconditional contingency table (a table with no information on LC membership) Invited paper 192-2007

  15. LCA’s goal is to produce a complete (conditional) table that assigns counts for each latent class: Invited paper 192-2007

  16. Assumptions of LCA • Exhaustiveness ABCD = X=t ABCDX • Conditional (Local) Independence ABCDX = ABCD|X =A|X B|X C|X D|X X (Goodman’s probabilistic parameterization of an LC model with 4 observed variables) Invited paper 192-2007

  17. ML approach to LC estimation • probability of obtaining observed count nijkl for response profile {i,j,k,l} is (ABCDX )nijklt • likelihood of obtaining a set of observed counts for several response profiles is L = i j k l t (ABCDX )nijklt log L = i j k l t nijklt ln(ABCDX ) Invited paper 192-2007

  18. ML approach to LC estimation • Because LC membership (X=t) is unobserved, the likelihood function and likelihood surface are complex. Invited paper 192-2007

  19. EM algorithm calculates L when some data (X) are unobserved “M” step produces ML estimates from complete table “E” step uses parameter estimates to update expected values for cell counts nijklt in complete contingency table Invited paper 192-2007

  20. EM algorithm requires initial estimates “M” step Functions achieved in SAS-IML or conventional DATA steps 1st “E” step: provides initial estimates to “fill in” missing information on LC membership “E” step Invited paper 192-2007

  21. EM algorithm instituted using SAS-IML or conventional DATA steps “M” step 1st “E” step: randomly assigns each response profile to one latent class “E” step Invited paper 192-2007

  22. Alternative approach using SAS PROC CATMOD moon.ouhsc.edu/dthompso/ latent%20variable%20research/lvr.htm “M” step PROC CATMOD 1st “E” step: SAS DATA step randomly assigns each response profile to one latent class “E” step SAS DATA step Invited paper 192-2007

  23. Other approaches • PROC LCA, Methodology Center of Penn State University methcenter.psu.edu/lca/ • LC regression macros K. Bandeen-Roche, Johns Hopkins Invited paper 192-2007

  24. EM algorithm does not produce standard errors Strategies include: • Converting CATMOD’s loglinear parameter SE into probabilities • Bootstrapping SE • Obtain SE from multiple solutions Invited paper 192-2007

  25. Strategy 1: Convert SE obtained from CATMOD’s loglinear model Invited paper 192-2007

  26. Loglinear SE are convertible to probabilities (after Heinen, 1996) But probabilities are complex nonlinear functions of their loglinear counterparts: • latent class prevalences: P(X=t) = exp tX / x exp tX • conditional probabilities: P(A=i | X=t) = P(AX) / P(X) =exp(iA+itAX) / a exp(iA+itAX) Invited paper 192-2007

  27. Strategy 2: Bootstrap parameter estimates and SE • Generate initial LCA solution and use its parameter estimates to generate a complete (conditional) contingency table. • From complete table, generate B bootstrapped unconditional tables. • Perform LCA on each table, producing B sets of parameter estimates. • The mean and SD of these constitute, respectively, parameter estimates and SE. Invited paper 192-2007

  28. Bootstrapping • Creating multiple samples by resampling repeatedly from original sample • Bootstrapped samples typically chosen randomly, with replacement, so n equals that of original sample • Statistical operation repeated on each bootstrapped sample. Invited paper 192-2007

  29. Efficient bootstrapping code (Barker, 2005) data boot; do bootsamp=1 to 100; do i=1 to nobs; pick=round(ranuni(0)*nobs); set original nobs=nobs point=pick; output; end; end; stop; run; Invited paper 192-2007

  30. Bootstrapped estimates Invited paper 192-2007

  31. Strategy 3: Generate multiple solutions from different starting values Invited paper 192-2007

  32. Estimates and SE from multiple solutions, each from a different initial assignment of response profiles Invited paper 192-2007

  33. Estimates of conditional probabilityP(A=1|X=1) from multiple estimates P(A=1|X=1) from bootstrapped estimates Repeated solutions approach may be more useful than bootstrapping because it explicitly accounts for LCA’s sensitivity to initial estimates. Invited paper 192-2007

  34. Multiple solutions and bootstrapping approaches are useful, but present a new challenge. Above: Distribution of multiple estimates of conditional probability P(A=1|X=1) Below: P(A=1|X=2) “Fracturing” of distributions of LC estimates. Invited paper 192-2007

  35. What fractures the distributions? Invited paper 192-2007

  36. What fractures the distributions? Latent classes have no intrinsic meaning. Identification of LC membership is flexible. LCA can attribute a vector of parameter estimates to LC X=1 for one solution, and to LC X=2 for the next. Invited paper 192-2007

  37. How to resolve fracturing Simulation studies confirm that vectors of parameter estimates are individually coherent. Consistent assignment of vectors to the appropriate latent classes should cure fracturing. What rule leads to consistent assignment? Invited paper 192-2007

  38. Rule: Reflect all estimates in a vector into the half-plane most heavily populated by conditional probabilities of the most informative indicator. In this example, D is the most informative indicator, so estimates for every parameter are reflected into indicator D’s more heavily populated (upper left) half plane. Invited paper 192-2007

  39. Distribution of estimates after reflection P(A=1|X=1) P(A=1|X=2) Invited paper 192-2007

  40. With the fracturing problem solved, the multiple solutions approach is an attractive strategy to overcome EM algorithm’s inability to produce standard errors. Invited paper 192-2007

  41. Limitations of LCA • Sample size must support detection of weak latent structures, those with: Rare latent class(es) Uninformative indicators Invited paper 192-2007

  42. Limitations of LCA • Fit statistics primarily assess conditional independence and so don’t alert the analyst when LCA is struggling to characterize weak latent structure. Invited paper 192-2007

  43. Limitations of LCA • Violations of assumption of conditional independence • conditional (or residual) dependence Invited paper 192-2007

  44. Conditional dependence • leads to poor estimation Overestimation of informativeness of both correlated indicators Overestimation of prevalence of other LC • leads to poor model fit Analyst may respond by positing additional latent classes, which complicates interpretation. Model’s applicability limited when modifications increasingly capitalize on data’s idiosyncracies. Invited paper 192-2007

  45. Assessing conditional dependence Z scores compare observed log odds ratios for pairs of indicators with those expected under conditional independence (Garrett & Zeger, 2000) Pairs of Indicators ___________Log odds____________ Expected Observed ASE z a b 0.2993 0.7270 0.3557 1.2024 a c 0.3630 0.7953 0.3557 1.2154 a d 0.7847 0.5312 0.4796 -0.5285 b c 0.6534 0.5586 0.2760 -0.3435 b d 1.2871 1.3876 0.3430 0.2929 c d 1.6395 1.6994 0.3685 0.1626 Large z scores arouse suspicion that pairs of indicators are conditionally dependent. Invited paper 192-2007

  46. Accounting for conditional dependence • Pairwise conditional dependence can be incorporated into a revised model. • Patterns of dependence and independence are flexibly expressed in both LCA parameterizations Probabilistic (Goodman) ABCDX=A|X B|X C|X D|X X Loglinear (Haberman) ln ABCDX =+ iA + jB + kC + lD + tX + itAX + jtBX + ktCX + ltDX Invited paper 192-2007

  47. Accounting for conditional dependence • Take advantage of CATMOD’s loglinear modeling capabilities in the M step. • The standard M step that assumes conditional independence: ods output estimates=mu; proc catmod order=data; weight count; model a*b*c*d*x=_response_ /wls addcell=.1; loglin a b c d x a*x b*x c*x d*x; run; quit; ods output close; Invited paper 192-2007

  48. Accounting for conditional dependence • Modifying the CATMOD M step to model conditional dependence between indicators B and C: ods output estimates=mu; proc catmod order=data; weight count; model a*b*c*d*x=_response_ / wls addcell=.1; loglin a b c d x a*x b*x c*x d*x b*c b*c*x; run; quit; ods output close; Invited paper 192-2007

  49. Concluding remarks • LCA is a potentially valuable tool in clinical epidemiology for clarifying ill-defined diagnostic and prognostic classifications. • Recent work brings LCA into SAS’ analytic framework. Invited paper 192-2007

  50. In any approach to LCA, sensitivity to initial estimates requires caution • Employ repeated solutions from different initial estimates • E-M loop should iterate between 3 and 40 times • Probe assumption of conditional independence • At least four indicators needed • Expanded model can account for dependence Invited paper 192-2007

More Related