190 likes | 458 Views
Analyzing Survey Error with Latent Class Models . Paul Biemer RTI International and University of North Carolina. March 18, 2005. What is Latent Class Analysis?. Special case of log-linear analysis with latent variables
E N D
Analyzing Survey Error with Latent Class Models Paul Biemer RTI International and University of North Carolina March 18, 2005
What is Latent Class Analysis? • Special case of log-linear analysis with latent variables • Latent variables are constructs which are measured imperfectly by indicator variables • Traditional LCA assumes local independence • i.e., P(A and B|X) = P(A|X)P(B|X) for latent variable X and indicators A and B • LCA models contain • Structural component – describes relationship among latent variables and covariates • Measurement component – describes the relationship among the indicators, latent variables and covariates
Uses of Latent Class Analysis in Survey Research • Substantive researchers focus on the structural component of the LCM • Errors treated as nuisance parameters • Survey methods researchers focus on the measurement component • Estimate components of total survey error • Evaluation of questionnaires and alternative survey designs • Population size estimation • Compensation for missing data • Survey bias adjustment
Objective of LCA for Measurement Error Analysis • Obtain estimates of classification error for a categorical survey variable • For e.g., false positive and false negative error rates • Why are these LCA estimates useful? • Quantify the measurement error in the data • Identify the correlates of measurement • Trace error to its root causes • Eliminate the cause through redesign
Example – Estimating the Error in Survey Measurements of Marijuana Use Three Indicators of Marijuana Use Indicator A - How long has it been since you last used marijuana or hashish? A = Yes, if indication of last 12 month use A = “No” if otherwise Indicator B - Now think about the past 12 months from your 12-month reference date through today. On how many days in the past 12 months did you use marijuana or hashish? B = “Yes” if response is 1 or more days; B = “No” otherwise
Indicator C – a composite variable based upon 7 questions such as • used in last 12 months? • spent a great deal of time getting it, using it, or getting over its effects? • used drug much more often or in larger amounts than intended? C = “Yes” if response is positive to any question suggesting use in last 12 months C = “No” otherwise
Statistical Framework NOTATION X = true drug use status (1 if use, 2 if no use) unknown latent variable A, B, and C are 3 dichotomous indicators of X or
Log-linear Formulation of the Latent Class Model is equivalent to in which i.e., hierarchical LLM {AX BX CX}
Estimation Use MLE to obtain estimates of from the multinomial likelihood equation of the AxBxC classification table
Some Results (modeling details in Biemer and Wiesen, 2000) • LCA models were fit to three years of data from the National Survey of Drug Use and Health • Discovered several important anomalies were in the estimates of marijuana use • Low frequency marijuana uses tended to answer negatively to the frequency question • Composite variable was subject to false positive as a result of a questionnaire problem that was subsequently corrected
Frequency of Use for Persons Responding ‘No’ to A 5.84 More than 300 days 5.84
Other Applications Nonsampling Error Research • Identifying flawed questions and other questionnaire problems • Estimating census undercount in a capture-recapture framework • Characterizing respondents, interviewers, and questionnaire elements that contribute to survey error • Adjusting for nonresponse and missing data in surveys
Other Applications (cont’d) Substantive Research • Causal modeling • Log-linear analysis compensating for measurement error • Cluster analysis • Variable reduction and scale construction
Importance of Model Validity Depends Upon the Application • In the previous example, validity was “proven” by ability to identify real questionnaire problems. • In other applications, this type of validation may be quite difficult • Further, LCA methodology is being pushed to adjust the reported survey estimates for misclassification bias. • Unemployment rate • Expenditures • Total population size in a census
Some Issues for Future Research Investigating the Validity of LCA Estimates • Robustness of the estimates of classification error probabilities to violations of the model assumptions • Local dependence • Unobserved heterogeneity • Dependent classification errors • Unequal probability sampling • Sample clustering
Some Issues for Future Research(cont’d) • Robustness of the model fit statistics • L2 and X2 • Convergence problems • Local maxima • Boundary solutions • Bias in the estimates of standard errors of the estimates • Effects of weighting • Clustered samples
Some Recent Literature • Asparouhov, T., Muthen & Muthen (2004). “Weighting for Unequal Probability of Selection in Latent Variable Modeling,” Mplus Web Notes: No. 7, Version 3 • Patterson, B., Dayton, M., and Graubard, B. (2002). “Latent Class Analysis of Complex Sample Survey Data: Application to Dietary Data,” JASA, Vol. 97, No. 459, pp. 721-741 • Vermunt, J. and Magidson, J. (2001). “Latent Class Analysis with Sampling Weights,” presented at the Sixth Annual Meeting of the Methodology Section of the American Sociological Association, University of Minnesota • Biemer, P., Brown, G., and Judson, D. (2004). “Robustness of LCA Estimates of Population Size to Model Failure,” unpublished Census Bureau project reports