90 likes | 302 Views
Rejoinder. Purpose of Study. I. Utilizing latent variable methods, to develop a new and useful definition of low-consumption and high-consumption groups among vegetable consumers.
E N D
Rejoinder Patterson, Dayton & Graubard ASA, August 2002, NYC
Purpose of Study I. Utilizing latent variable methods, to develop a new and useful definition of low-consumption and high-consumption groups among vegetable consumers. II. To extend conventional methods of latent class analysis to the analysis of data from complex survey designs incorporating weights and clusters. Patterson, Dayton & Graubard ASA, August 2002, NYC
Points of Contention – Binary Scale Use of binary scale of measurement deemed to be consistent with reliability (validity?) of dietary intake data available in CSFII. Dichotomizing the data allows for relatively sophisticated analysis with weak distributional assumptions. Other options: Continuous measurement (grams) – fit normal mixture model (e.g.) Servings (count) – fit mixture of Poisson processes (e.g.) Patterson, Dayton & Graubard ASA, August 2002, NYC
Points of Contention – Two Class Model With four binary variables there are 16 unique response patterns. An unrestricted 2-class model requires estimation of 9 unique parameters and leaves 6 degrees of freedom for assessing fit (conventional LCA). An unrestricted 3-class model requires estimation of 14 unique parameters and appears to leave 1 degree of freedom for assessing fit (conventional LCA). However, the model is unidentified (asymptotic covariance matrix of rank 13, not 14). Hence, there is no unique unrestricted 3-class model for 4 binary responses. We did not view this as a concern since our primary interest was to identify low- and high-consumption groups. Patterson, Dayton & Graubard ASA, August 2002, NYC
Points of Contention – Two Class Model (Cont’d) Constrained models – Since day 1 used face-to-face interview and days 2-4 were completed by telephone, it seems reasonable to constrain conditional probabilities in a corresponding manner. In fact, a differential pattern was apparent for the low-consumption class but not for the high-consumption class. We did not pursue this post-hoc observation with additional modeling but our methods could be used to obtain suitable estimates (model comparison is a more challenging issue). Separating vegetables – the data table can be enlarged by disaggregating types of vegetables and this would allow for the exploration of models with more than two latent classes. However, with a sample size of 1028, the resulting table would rapidly become sparse and pose analytical problems. On the other hand, a few categories of vegetables might be defined (e.g., deep yellow, dark-green leafy) and the resulting analyses could be of interest. Finally, it is tempting to model the actual 6 measurement occasions taking into account missingness. Conventional Markov latent class analysis, for example, could be (relatively) easily adapted for complex survey designs. We elected not to pursue this avenue since the 6 data points do not actually represent consistent time intervals during the year for different respondents. Also, for some respondents, observations were randomly deleted to yield 4 usable measures whereas for other respondents there were missing data. However, the mechanism for missingness at the respondent level is not reported in the database available for analysis. Patterson, Dayton & Graubard ASA, August 2002, NYC
Points of Contention - Classification Given estimates for the latent class parameters, a Bayes rule can be applied to classify each respondent as a low- or high-consumer. Given these classifications, additional analyses can be conducted using the classes as the dependent variable (e.g., logistic regression). In conventional LCA, this method constitutes a two-stage “approximation” to the covariate latent class model presented by Dayton & Macready (1988) in JASA. The linear logistic covariate model for the proportion in latent class 1, conditional on covariates, Z, is of the form: Other than a recent study by Kuo (2001), very little is known about how well the two-stage strategy approximates the covariate model and nothing is known in the context of complex survey designs. Patterson, Dayton & Graubard ASA, August 2002, NYC
Points of Contention - Sample Weights The sampling design for CSFII is complex with stratified multistage cluster sampling as well as adjustment, using weights, for non-response and other factors. IF a homogeneous latent class model were appropriate (i.e., equal conditional probabilities across strata and clusters) THEN, on average, the use of weights would have no effect on the estimates for conditional probabilities. It seems unlikely that such homogeneity characterizes the population and there is no direct method to statistically assess this assumption given the complex design. Our model incorporating weights is aimed at estimating an overall population model (“census” model). In fact, the patterns of conditional probabilities with and without the use of weights are very similar with some notable depression of estimates for the low-consumption latent class. However, estimates for the proportion in the low-consumption class do differ widely (.33 unweighted versus .18 weighted). Patterson, Dayton & Graubard ASA, August 2002, NYC
Points of Contention – Sampling Variances We chose to explore a jackknife estimate for sampling variances of the latent class parameter estimates. The jackknife is relatively easy to program, has wide applicability and can be implemented to capture the characteristics of a complex survey design. As expected, jackknife variances tend to slightly overestimate “true” variances but coverage for confidence intervals are close to the nominal value (.95). In the context of the present study, there are certainly other methods that should be explored for estimating sampling variances including linearization, balanced half-sample replication and bootstrapping. Patterson, Dayton & Graubard ASA, August 2002, NYC
Future Research Directions Assessing Model Fit – In conventional LCA, observed and expected frequencies can be used to compute a Pearson or likelihood-ratio chi-square goodness-of-fit statistic. The performance of such tests in the context of complex survey data with estimation based on pseudo-likelihoods is not known. We utilized a Wald test to assess fit although its performance for latent class in the present context has not been studied. Comparing Models – In conventional LCA, difference chi-square statistics can be used to compare nested models only if no restriction to a boundary value is required to constrain the more complex model to the simpler model. For example, a legitimate application would be restricting conditional probabilities to equality for times 2-4. However, as is true for mixture models in general, boundary constraints violate the asymptotics required for the chi-square tests. Thus, for example, an unrestricted two-class model cannot be compared to an unrestricted three-class model. Information criteria such as AIC or BIC have been suggested for this latter purpose but there are limited simulation results for these methods at present. For complex survey data, even the former case is questionable although Wald tests may be useful here. Bayesian Methods – if one were willing to assume prior information about parameters, modern computer-intensive methods such as Markov chain Monte Carlo (MCMC) could be used to simulate distributional results and offer new opportunities for analysis. Patterson, Dayton & Graubard ASA, August 2002, NYC