1 / 22

Missing Data Mechanisms

Missing Data Mechanisms. MCAR MAR MNAR References: Schafer, J., Graham, J.W. Missing data: our view of the state of the art. Psychological Methods,7(2), 147-177, 2002 Raghunathan, T.E., What do we do with missing data ?

mdewitt
Download Presentation

Missing Data Mechanisms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Missing Data Mechanisms • MCAR • MAR • MNAR References: Schafer, J., Graham, J.W. Missing data: our view of the state of the art. Psychological Methods,7(2), 147-177, 2002 Raghunathan, T.E., What do we do with missing data ? Some options for analysis of incomplete data. Ann. Rev. Public Health 25: 99-117, 2004

  2. Graphical representation Y = variable partly missing X = variable completely observed Z = cause of missingness (unrelated to Y) R = represents missingness

  3. X Z X Z X Z R Y Y R Y R MCAR MAR MNAR

  4. Use of conditional probability Yc = the complete vector of Y observations Yc = ( Yo , Ym) MCAR: P (R | Yc) = P(R) Prob of missing does not depend on Yo MAR: P (R | Yc) = P( R | Yo) Prob of missing depends only on Yo MNAR: P (R | Yc) = P( R | Ym) Prob of missing depends on unobserved Ym

  5. Methods for analyzing data with missing values in the repeated measures situation Case deletion: delete subjects with missing components (complete case analysis) Available case analysis: analysis is based on all observable data (use data from subjects with complete Y vectors as well as incomplete Y vectors)

  6. Simulation Study: Parameter: MCAR MAR MNAR --------------------------------------------------------- Μean(Y):125 125 (7.0) 143.3(19.3) 155.5(30.7) Std(Y): 25 24.6(5.3) 20.9(5.8) 12.2(13.2) Rho: 0.6 0.59(0.2) 0.33(0.37) 0.34(0.36) Beta Y|X: 0.6 0.61(0.27) 0.60 (0.51) 0.21(0.43) Beta X|Y: 0.6 0.60(0.25) 0.20(0.44) 0.60(0.52) --------------------------------------------------------------------------- Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

  7. Methods for analyzing survey data Weight responses that are present Average the available items (social sciences based on standardized scores but not studied in any systematic fashion)

  8. Single imputation MS: Mean substitution HD: Hot Deck CM : conditional mean PD: predictive distribution

  9. ML estimation Widely accepted Yields unbiased estimators under general regular conditions Provides a mechanism to do inference: testing hypotheses and confidence intervals Often relies on the EM algorithm Newton-Raphson /Fisher scoring used in multilevel modeling

  10. Software for ML estimation SPSS: missing data module EMCOV NORM SAS: Proc Mixed S-Plus: lme function STATA LISREL Mplus HLM / MLWin (multi-level models)

  11. Simulation Study: ML estimation Parameter: MCAR MAR MNAR --------------------------------------------------------- Μean(Y):125 124.8(6.5) 125.2(16.9) 151.6(26.9) Std(Y): 25 24.2(5.7) 25.5(7.4) 12.3(13.2) Rho: 0.6 0.61(0.2) 0.52(0.38) 0.39(0.36) Beta Y|X: 0.6 0.61(0.27) 0.60 (0.51) 0.21(0.43) Beta X|Y: 0.6 0.63(0.25) 0.49(0.38) 0.79(0.68) --------------------------------------------------------------------------- Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

  12. ML estimation More attractive than ad-hoc methods Assume a large sample May or may not be robust to model assumptions Assume MAR

  13. Multiple Imputation Each missing value replaced by m > 1 values: effectively create m datasets Efficiency: (1 + λ / m)-1 where λ is the rate of missing information implies m need not be large but certainly larger than 1 Rubin’s rules for combining estimators are now well accepted Helps to be a Bayesian ! MAR is usually assumed

  14. Software NORM Proc MI in SAS: regression, propensity scores, MCMC This does NORM plus other routines SAS macro: IVE library S-Plus: missing data library (NORM) longitudinal data uses function PAN LISREL: missing data library like NORM SOLAS (same as Proc MI ??) http://www.multiple-imputation.com

  15. Comments on MI methods Regression based MI methods are really based on Ml estimation: usually require a multivariate normal distribution Should you transform skewed data to normality (log or power transformation)? Partial answer: no Graham and Schafer (1999) Practice of rounding data to create binary/ordinal variables ? Partial answer: okay even for small samples

  16. Comments continued: However: better specialized methods are available Schaffer (1997) for nominal data Liu et al (2000) for clustered data How about propensity scores ? No: can distort covariance structure in data (Allison, 2000)

  17. Simulation Study: MI (NORM) Parameter: MCAR MAR MNAR --------------------------------------------------------- Μean(Y):125 124.9(6.5) 125.3(17.2) 151.6(26.9) Std(Y): 25 25.9(5.9) 28.7(8.2) 13.6(12.1) Rho: 0.6 0.57(0.2) 0.45(0.37) 0.35(0.36) Beta Y|X: 0.6 0.61(0.27) 0.59 (0.52) 0.21(0.43) Beta X|Y: 0.6 0.56(0.22) 0.39(0.38) 0.66(0.56) --------------------------------------------------------------------------- Generate: 50 observations from bivariate normal (Y,X) MCAR: prob Y missing is 0.73 (high !) MAR: prob Y missing if X < 141 MNAR: prob Y missing if Y < 141

  18. Methods that do not assume MAR Selection models Pattern Mixture models

  19. Food for thought In an longitudinal study on aging many subjects die while on study Is MAR a reasonable assumption ? Alternatively: joint modeling of outcome and death may be superior

More Related