1 / 110

Missing Data: Analysis and Design

Missing Data: Analysis and Design. John W. Graham The Prevention Research Center and Department of Biobehavioral Health Penn State University. Presentation in Four Parts. (1) Introduction: Missing Data Theory (2) A brief analysis demonstration Multiple Imputation with NORM and Proc MI

brenna
Download Presentation

Missing Data: Analysis and Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Missing Data: Analysis and Design John W. Graham The Prevention Research Center and Department of Biobehavioral Health Penn State University

  2. Presentation in Four Parts • (1) Introduction: Missing Data Theory • (2) A brief analysis demonstration • Multiple Imputation with • NORM and Proc MI • Amos...break... • (3) Attrition Issues • (4) Planned missingness designs: • 3-form Design

  3. Recent Papers • Graham, J. W., Cumsille, P. E.,& Elek-Fisk,E. (2003).Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.). Research Methods in Psychology (pp. 87_114). Volume 2 of Handbook of Psychology (I. B. Weiner, Editor-in-Chief). New York: John Wiley & Sons. • Collins, L. M., Schafer, J. L.,& Kam, C. M.(2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330_351. • Schafer, J. L.,& Graham,J. W.(2002).Missing data: our view of the state of the art. Psychological Methods, 7, 147-177. jgraham@psu.edu

  4. Part I:A Brief Introduction toAnalysis with Missing Data

  5. Problem with Missing Data • Analysis procedures were designed for complete data. . .

  6. Solution 1 • Design new model-based procedures • Missing Data + Parameter Estimation in One Step • Full Information Maximum Likelihood (FIML)SEM and Other Latent Variable Programs(Amos, Mx, LISREL, Mplus, LTA)

  7. Solution 2 • Data based procedures • e.g., Multiple Imputation (MI) • Two Steps • Step 1: Deal with the missing data • (e.g., replace missing values with plausible values • Produce a product • Step 2: Analyze the product as if there were no missing data

  8. FAQ • Aren't you somehow helping yourself with imputation?. . .

  9. NO. Missing data imputation . . . • does NOT give you something for nothing • DOES let you make use of all data you have . . .

  10. FAQ • Is the imputed value what the person would have given?

  11. NO. When we impute a value . . • We do not impute for the sake of the value itself • We impute to preserve important characteristics of the whole data set . . .

  12. We want . . . • unbiased parameter estimation • e.g., b-weights • Good estimate of variability • e.g., standard errors • best statistical power

  13. Causes of Missingness • Ignorable • MCAR: Missing Completely At Random • MAR: Missing At Random • Non-Ignorable • MNAR: Missing Not At Random

  14. MCAR(Missing Completely At Random) • MCAR 1: Cause of missingness completely random process (like coin flip) • MCAR 2: • Cause uncorrelated with variables of interest • Example: parents move • No bias if cause omitted

  15. MAR (Missing At Random) • Missingness may be related to measured variables • But no residual relationship with unmeasured variables • Example: reading speed • No bias if you control for measured variables

  16. MNAR (Missing Not At Random) • Even after controlling for measured variables ... • Residual relationship with unmeasured variables • Example: drug use reason for absence

  17. MNAR Causes • The recommended methods assume missingness is MAR • But what if the cause of missingness is not MAR? • Should these methods be used when MAR assumptions not met? . . .

  18. YES! These Methods Work! • Suggested methods work better than “old” methods • Multiple causes of missingness • Only small part of missingness may be MNAR • Suggested methods usually work very well

  19. Revisit Question: What if THE Cause of Missingness is MNAR? • Example model of interest: X  Y X = Program (prog vs control) Y = Cigarette Smoking Z = Cause of missingness: say, Rebelliousness (or smoking itself) • Factors to be considered: • % Missing (e.g., % attrition) • rYZ . • rZ,Ymis .

  20. rYZ • Correlation between • cause of missingness (Z) • e.g., rebelliousness (or smoking itself) • and the variable of interest (Y) • e.g., Cigarette Smoking

  21. rZ,Ymis • Correlation between • cause of missingness (Z) • e.g., rebelliousness (or smoking itself) • and missingness on variable of interest • e.g., Missingness on the Smoking variable • Missingness on Smoking (Ymis) • Dichotomous variable: Ymis = 1: Smoking variable not missing Ymis = 0: Smoking variable missing

  22. How Could the Cause of Missingness be Purely MNAR? • rZ,Y = 1.0 AND rZ,Ymis = 1.0 • We can get rZ,Y = 1.0 if smoking is the cause of missingness on the smoking variable

  23. How Could the Cause of Missingness be Purely MNAR? • We can get rZ,Ymis = 1.0 like this: • If person is a smoker, smoking variable is always missing • If person is not a smoker, smoking variable is never missing • But is this plausible? ever?

  24. What if the cause of missingness is MNAR? Problems with this statement • MAR & MNAR are widely misunderstood concepts • I argue that the cause of missingness is never purely MNAR • The cause of missingness is virtually never purely MAR either.

  25. MAR vs MNAR: • MAR and MNAR form a continuum • Pure MAR and pure MNAR are just theoretical concepts • Neither occurs in the real world • MAR vs MNAR NOT dimension of interest

  26. MAR vs MNAR: What IS the Dimension of Interest? • Question of Interest:How much estimation bias? • when cause of missingness cannot be included in the model

  27. Bottom Line ... • All missing data situations are partly MAR and partly MNAR • Sometimes it matters ... • bias affects statistical conclusions • Often it does not matter • bias has minimal effects on statistical conclusions (Collins, Schafer, & Kam, Psych Methods, 2001)

  28. Methods:"Old" vs MAR vs MNAR • MAR methods (MI and ML) • are ALWAYS at least as good as, • usually better than "old" methods (e.g., listwise deletion) • Methods designed to handle MNAR missingness are NOT always better than MAR methods

  29. References • Graham, J. W., & Donaldson, S. I. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of followup data. Journal of Applied Psychology, 78, 119-128. • Graham, J. W., Hofer, S.M., Donaldson, S.I., MacKinnon, D.P., & Schafer, J.L. (1997). Analysis with missing data in prevention research. In K. Bryant, M. Windle, & S. West (Eds.), The science of prevention: methodological advances from alcohol and substance abuse research. (pp. 325-366). Washington, D.C.: American Psychological Association. • Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330-351.

  30. Analysis: Old and New

  31. Old Procedures: Analyze Complete Cases(listwise deletion) • may produce bias • you always lose some power • (because you are throwing away data) • reasonable if you lose only 5% of cases • often lose substantial power

  32. Analyze Complete Cases(listwise deletion) • 1 1 1 1 • 0 1 1 1 • 1 0 1 1 • 1 1 0 1 • 1 1 1 0 • very common situation • only 20% (4 of 20) data points missing • but discard 80% of the cases

  33. Other "Old" Procedures • Pairwise deletion • May be of occasional use for preliminary analyses • Mean substitution • Never use it • Regression-based single imputation • generally not recommended ... except ...

  34. Recommended Model-Based Procedures • Multiple Group SEM (Structural Equation Modeling) • LatentTransitionAnalysis (Collins et al.) • A latent class procedure

  35. Recommended Model-Based Procedures • Raw Data Maximum Likelihood SEMaka Full Information Maximum Likelihood (FIML) • Amos (James Arbuckle) • LISREL 8.5+ (Jöreskog & Sörbom) • Mplus (Bengt Muthén) • Mx (Michael Neale)

  36. Amos 7, Mx, Mplus, LISREL 8.8 • Structural Equation Modeling (SEM) Programs • In Single Analysis ... • Good Estimation • Reasonable standard errors • Windows Graphical Interface

  37. Limitation with Model-Based Procedures • That particular model must be what you want

  38. Recommended Data-Based Procedures EM Algorithm (ML parameter estimation) • Norm-Cat-Mix, EMcov, SAS, SPSS Multiple Imputation • NORM, Cat, Mix, Pan (Joe Schafer) • SAS Proc MI • LISREL 8.5+

  39. EM Algorithm • Expectation - Maximization Alternate between E-step: predict missing data M-step: estimate parameters • Excellent parameter estimates • But no standard errors • must use bootstrap • or multiple imputation

  40. Multiple Imputation • Problem with Single Imputation:Too Little Variability • Because of Error Variance • Because covariance matrix is only one estimate

  41. Too Little Error Variance • Imputed value lies on regression line

  42. Imputed Values on Regression Line

  43. Restore Error . . . • Add random normal residual

  44. Covariance Matrix (Regression Line) only One Estimate • Obtain multiple plausible estimates of the covariance matrix • ideally draw multiple covariance matrices from population • Approximate this with • Bootstrap • Data Augmentation (Norm) • MCMC (SAS 8.2, 9)

  45. Regression Line only One Estimate

  46. Data Augmentation • stochastic version of EM • EM • E (expectation) step: predict missing data • M (maximization) step: estimate parameters • Data Augmentation • I (imputation) step: simulate missing data • P (posterior) step: simulate parameters

  47. Data Augmentation • Parameters from consecutive steps ... • too related • i.e., not enough variability • after 50 or 100 steps of DA ... covariance matrices are like random draws from the population

  48. Multiple Imputation Allows: • Unbiased Estimation • Good standard errors • provided number of imputations is large enough • too few imputations  reduced power with small effect sizes

  49. From Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (in press). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science.

  50. Part II:Illustration of Missing Data Analysis: Multiple Imputation with NORM and Proc MI

More Related