230 likes | 388 Views
SW 983 . Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA, Instructors: Wei Wu and Mijke Rhemtulla. Why Do We Care?.
E N D
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA, Instructors: Wei Wu and MijkeRhemtulla
Why Do We Care? • A major goal of inferential statistics involves the estimation of parameters and their standard errors. • When our estimations of parameters are off they are said to be biased. • When our estimates of standard errors are wrong it effects our ability to do significance tests. • Missing data can impact both.
Old methods Deletion methods (listwise, pairwise deletion) • Ignore data from a subset of participants who have missing values on some cases (listwise deletion = throw out whole cases; pairwise deletion = selectively delete cases depending on the computation) • Throwing out data leads to loss of power • Selectively deleting data/ignoring missing data can lead to a high degree of bias
Old Methods • Single-imputation methods • Mean imputation • Replace each missing value with the variable’s mean • Adds no new information • Regression imputation • Use linear regression to predict each missing observation based on the other variables that are present
Mean imputation • Advantages: • Filling in missing values results in complete N • Can use all available data • Disadvantages: • Will give biased statistics (correlations, regression coefficients, standard deviations, path coefficients) under any kind of missingness • Correlations, covariances, and regression/path coefficients will be too weak • Variances and standard deviations will be too small • “The method achieves little except the illusion of progress” (Little & Rubin, 1990, p.380) 3. Traditional Methods
Regression imputation • Note that in order to get a predicted value for every missing value, predictors cannot themselves have missing values • When computing the regression equations, you must use either a deletion method or mean imputation • When filling in missing values, you must use mean imputation to substitute into the regression equations • Alternatively, only use complete predictors 3. Traditional Methods
Regression Imputation • Advantages: • Borrows information from observed data • Point estimates of missing observations are more accurate than mean imputation • Disadvantages: • Will give biased statistics (correlations, regression coefficients, standard deviations, path coefficients) under any kind of missingness • Correlations, covariances, and regression/path coefficients will be too weak • Variances and standard deviations will be too small 3. Traditional Methods
Patterns and Mechanisms • Mechanisms describe why data are missing • can affect the ease of recovering relations among variables • Patterns describe where data are missing • can affect the ease of recovering relations among variables • can affect the extent to which results will be biased
Missing Patterns A B C 2. Patterns and Mechanisms
MISSING MECHANISMS • Missing Completely at Random (MCAR) • Missing at Random (MAR) • Missing Not at Random (MNAR or NMAR)
MCAR 1. Completely Random Missingness (MCAR) For example: You measure SES and math scores • If some students were randomly selected to go on a field trip on the day of testing and they missed the test • If you collected data from all students and your dog ate some of them • If some students forgot to fill in the answer sheet for some items at random
MCAR 1. Completely Random Missingness (MCAR) Prognosis: great! Almost any method of analysis will lead to unbiased results. Modern methods (Imputation, Maximum Likelihood) will give you more power than older methods (e.g., Deletion).
MAR 2. Random Missingness (MAR) The reason data are missing is unrelated to the missing values after controlling for the relation between missingness and measured variables.
MAR 2. Random Missingness (MAR) For example: You measure SES and math scores • If low-SES children tend to have poorer math scores, and they are more likely to be absent from testing, AND after accounting for SES, there is no further relation between math scores and the propensity for data to be missing.
MAR 2. Random Missingness (MAR) Prognosis: Good. IF you use modern missing data methods that account for the relations between observed variables and missingness, results will be unbiased and power will be recovered. Using old methods (e.g., deletion), results will be biased.
MNAR 3. Non-Random Missingness (MNAR or NMAR) For example: You measure SES and math scores • If children with low math scores are more likely to avoid writing the test. Even after accounting for SES, math scores continue to be related to the propensity for missingness • If all children write the test but they skip items that they find difficult. Whether an item is missing is directly related to the child’s true score on that item
MNAR MAR • To the extent that it is possible to collect data that correlate with the missingness (R), MNAR data can be made to approximate MAR. • e.g., if math ability and reading ability are highly correlated, then reading scores might predict MNAR missingness on math scores. If the missingness can be predicted, it becomes MAR missingness.
MCAR TESTS • How do you know what kind of missingness you have? • MAR/MNAR are impossible to test for • But there are tests for MCAR • If missingness is MCAR, then observed data should be no different from missing data • We can measure this by examining group differences on a predictor (“auxiliary”) variable • e.g., we can ask whether those who are missing a math score have significantly different SES scores than those whose math scores are not missing 2. Patterns and Mechanisms
MCAR TESTS • Is it worth testing for MCAR? • Most naturally-occurring missingness is not MCAR • Even if missingness is MCAR, new methods are still better (i.e., more powerful) than old methods • New methods don’t distinguish between MCAR and MAR • But: MCAR tests may be useful for identifying variables relevant to the MAR process (i.e., auxiliary variables) • Auxiliary Variables are those that we include in analysis because they predict missingness 2. Patterns and Mechanisms
Patterns and Mechanisms: Summary • Patterns of missingness can indicate different causes of missingness (attrition, planned missing, nonresponse) • The number and kind of missing patterns can affect covariance coverage and fraction of missing information, resulting in better or worse parameter estimates • MCAR missingness is the only missing mechanism that does not lead to bias with traditional methods (e.g., deletion) and it is also the rarest • MAR missingness is attainable by measuring covariates that predict missingness • MNAR missingness is related to the missing values themselves and will result in poor-quality estimates • Tests for MCAR are possible but of questionable value 2. Patterns and Mechanisms
FIML vs. Multiple Imputation (Note – with large samples they produce the same results) • Advantage Multiple Imputation • Use of auxiliary variables (but Mplus does allow) • Treatment of incomplete explanatory variables (?) • Item level imputation (i.e. item nonresponse) • Advantage Maximus Likelihood • Estimating interaction terms • SEM • Fewer procedural ambiguities (i.e. it’s easier to do)