1 / 14

Imputation for Multi Care Data

Naren Meadem. Imputation for Multi Care Data. Introduction. What is certain in life? Death Taxes What is certain in research? Measurement error Missing data Missing data can be: Due to preventable errors, mistakes, or lack of foresight by the researcher

trang
Download Presentation

Imputation for Multi Care Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Naren Meadem Imputation for Multi Care Data

  2. Introduction • What is certain in life? • Death • Taxes • What is certain in research? • Measurement error • Missing data • Missing data can be: • Due to preventable errors, mistakes, or lack of foresight by the researcher • Due to problems outside the control of the researcher • Deliberate, intended, or planned by the researcher to reduce cost or respondent burden • Due to differential applicability of some items to subsets of respondents • Etc.

  3. Missing Data Mechanisms (1) • Preliminaries: • Yobs: The non-missing or observed data • Ymiss: The missing or unobserved data • M: Whether the data on a given item for a given case is missing (1) or not (0) • Missing Completely at Random (MCAR) • The probability that an item is missing (M) is unrelated to either the observed (Yobs) or the unobserved (Ymiss) data • Missing at Random (MAR) • The probability that an item is missing (M) may be related to the observed data (Yobs) but is unrelated to the unobserved data (Ymiss) • Missing Not at Random (MNAR) • The probability that an item is missing (M) is related to the (unknown) value of the unobserved data (Ymiss), even after conditioning on the observed data (Yobs)

  4. Missing Data in Research Studies • Missing data mechanism • Missing completely at random (MCAR)—Ignorable • Missing at random (MAR)—Conditionally ignorable • Missing not at random (MNAR)—Nonignorable • Amount of missing data • Percent of cases with missing data • Percent of variables having missing data • Percent of data values that are missing • Pattern of missing data • Missing by design • Missing data patterns • Univariate • Monotonic • File matching • General

  5. Newer Missing Data Treatments • Modern state-of-the-art missing data treatments for MAR data • Maximum likelihood • Multiple imputation • Cutting edge investigational missing data treatments for MNAR data • Pattern mixture models • Selection models • Shared parameter models • Inverse probability weighting

  6. Clustering methods: Mean substitution • Substitute the mean of the variable for the missing values

  7. Graphical illustration

  8. Better methods of handling missing data • Full information maximum likelihood methods • Can handle data that are MAR and NI • Special consideration required for NI data • Implemented as part of hierarchical linear modeling and structural equation modeling • Missing data handled during analysis • Multiple imputation • Can also handle data that are MAR and NI • Special consideration required for NI data • Simulation-based approach • Missing data are handled separately from analysis

  9. Multiple imputation • Three steps: • Generate multiple complete-case datasets (imputations) through simulation (only 5 – 10 are needed) • Perform analyses on each imputation • Combine the multiple analyses using a set of special rules (Rubin’s (1987) rules)

  10. Results No Imputation Naive Bayes Logistic Regression SVM AUC: 0.6362 0.6025 0.635 Imputation AUC: 0.6377 0.6033 0.649

  11. Conclusions • When you have missing data, think about WHY they are missing • Ask yourself whether you have observed variables that could explain why the data are missing • Missing data handled improperly can bias your conclusions • Multiple imputation is one good way of handling missing data • Caveats: • Multiple imputation is complex • An evolving field • The standards of reporting the results from imputed data are not well-established • If you need to do it (especially if you think your data are NI), read the source papers I referenced at the beginning of the slides

  12. Questions?

More Related