170 likes | 278 Views
Biostat 2065 Review. November 2, 2009. Data with missing values. Taxonomy. Complete-cases based methods ( i ). Complete-cases based methods (ii). Available-case analysis. Lack of self-consistency. Not useful. Single imputation methods ( i ). Explicit modeling:
E N D
Biostat 2065 Review November 2, 2009
Available-case analysis • Lack of self-consistency. • Not useful.
Single imputation methods (i) • Explicit modeling: • Unconditional mean imputation. • Conditional mean imputation. • Stochastic regression imputation. Stochastic regression imputation is the best among these three that it produces consistent estimates for higher moments or covariance. But the standard error estimates are not correct.
Single imputation methods (ii) • Implicit modeling: • Hot-Deck imputation. • Nearest neighbor hot deck. These are ad-hoc approaches. They are intuitive and easy to be carried out. But the performance is difficult to evaluate.
Inference on single imputation method • Explicit standard error estimates are available under certain sampling/imputation methods: imputation carried out within each ultimate cluster was unbiased. • Bootstrap standard errors. • Jackknife standard errors.
Multiple imputation • Multiple imputation takes care of the between imputation variation. • It is very effective when the estimate is normally distributed. Only a handful imputations are necessary. • However, variability on the prediction model still needs to be considered. For example, for bivariate data with MCAR nonresponse, by bootstrapping the complete cases to sample the predictive distribution.
Factor likelihood method for monotone data with an ignorable mechanism
Sweep operator • Useful for analysis of multivariate normal data. • By applying sweep and reverse sweep operators, the parameters for a multivariate normal distribution can be derived from the marginal distribution of a “baseline” variable and a sequence of conditional normal regressions.
Computation algorithms • Newton-Raphson • Fast convergence rate in the neighborhood of MLE. • Unstable. • EM • Designed for analysis of data with missing values. • Stable: increase the likelihood function. • Slow convergence. • Extensions: ECM, ECME, and PXEM.