190 likes | 207 Views
Learn why handling missing data is crucial to avoid biased results and explore various ways to address different types of missing data, including MCAR, MAR, and non-MCAR. Discover methods such as treating missing data as a separate category, deletion strategies like listwise and pairwise deletion, substitution techniques like mean substitution and hot deck method, and advanced approaches like imputation and multiple imputation. Uncover the three steps of multiple imputation and how to combine results effectively. Reference SAS resources for in-depth insights.
E N D
Treatment of Missing Data Wayne Jiang, FCAS Safeco Insurance Companies
Why missing handling is important • If not properly handled, missing data can lead to biased, invalid or insignificant results.
Different kinds of missing data • Missing completely at random (MCAR). • The probability that an observation is missing is unrelated to the value of the variable or to the value of any other variables, i.e. missing values are randomly distributed across all observations .
Different kinds of missing data • Missing at random (MAR). • The probability of missing does not depend on the value of the variable after controlling for other variables. Or the missing is random after data is split into subgroups.
Different kinds of missing data • Missing not at random • Neither MCAR nor MAR. • Very hard to analyze.
Pattern of missing data • Monotone: • In the case of more than one variable can be missing, there is an order of variable can be missing.
Dealing with missing data • If the data set is large and a few random points are missing the problem is not serious. • In a smaller data set with a non-random distribution of missing values the problem may be serious.
Some ways to deal with the missing data problem (separate category) • Treat Missing as its own category • Could group very dissimilar classes together. • Severe bias could result.
Some ways to deal with the missing data problem (deletion) • Listwise deletion. • Data line with any missing is deleted. • Yield unbiased parameter estimate if MCAR. • Sacrifices predictive power as less data points used. • In SAS Proc REG use that as default.
Some ways to deal with the missing data problem (deletion) • Pairwise deletion • All available data used in calculation of correlation matrices. • Create sample size problem and possibly non-positive definite matrices problem. • In SAS Proc CORR use that as default.
Some ways to deal with the missing data problem (substitution) • Mean substitution • Replace missing data with global mean. • Simple approach. • Underestimate the error. • Hot deck method • Simple approach. Replace missing with value from similar record. • Has randomness built in. • Still underestimate error.
Some ways to deal with the missing data problem (imputation) • Regression • Replace missing data based on other variables. • Improvement over global mean. • Still underestimate the error.
Multiple imputation • A Monte Carlo technique in which the missing values are replaced by 3-10 simulated versions, each of the simulated datasets is analyzed, and the results are combined to produce results that incorporate missing data uncertainty. • More complicated but a lot less bias. • SAS users can use Proc MI and Proc MIAnalyze.
Three steps of multiple imputation • Impute data. • Data is assumed to be multivariate normal. Parameters are first estimated based on complete case. The imputed data is randomly picked from the distribution. Parameters are estimated again and another imputation follows. Do it until parameter converges. Then multiple sets of data are drawn randomly from the distribution.
Three steps of multiple imputation • Analyze data • Each set of data is analyzed use any preferred methods. • Proc ####; BY _Imputation_; …;Run; • Save the parameters in a data sets.
Three steps of multiple imputation • Combine results • Estimate = mean of all estimates. • Total variance = (Average within variance) + (1 + 1/m) (Between Variance). • Proc MIAnalyze parms =####; Run;
Reference • SAS online manual: http://support.sas.com/rnd/app/papers/miv802.pdf • Carpenter, J and Kenward, M http://www.lshtm.ac.uk/msu/missingdata/start.html