130 likes | 422 Views
Multiple Imputation using SAS. Don Miller 812 Oswald Tower miller@pop.psu.edu 814-863-3155. Introduction. Missing values occur often in research: refused/don’t know, attrition, skip patterns…
E N D
Multiple Imputation using SAS Don Miller 812 Oswald Tower miller@pop.psu.edu 814-863-3155
Introduction • Missing values occur often in research: refused/don’t know, attrition, skip patterns… • Dropping missing values may bias results (e.g. women and/or overweight tend to disclose their weight less often than others) • Attempts are made to impute the data (“fill in” missing values) • Single imputation (e.g. with the mean) is biased, doesn’t give measure of uncertainty
Multiple Imputation Simple Procedure • For categorical variables: Construct binary dummy variables, throwing out reference category (e.g. Race: 1=“white”, 2=“black”, 3=“other” becomes Black, Other variables) • Impute using PROC MI • Round off imputed dummies if you want plausible values (this will bias your results) • Do analysis: PROC REG, LOGISTIC, etc. using by _imputation_; in procedure • Combine results using PROC MIANALYZE
PROC MI • Typical syntax: • proc mi data=rawdat seed=8633155 out=impdat; var sex black other age drivesfast; run; • data= 1 copy of data with missing values • out= 5 copies of data with imputed values (will be different across copies) • seed= random seed, you can keep same to reconstruct your results • var Variables with missing values you need imputed, in model, and those that may be helpful with imputation
PROC MI Options • nimpute=5# imputations, default=5 0 gives missing patterns • minimum=0 0 0 0 set min & max, sometimes maximum=1 1 1 90doesn’t converge as well • round=1 1 1 0.01 round off option • alpha=0.05 confidence limits • mu0=0.5 0.5 0.5 25 t test null hypothesis μ=μ0
PROC MI Statements • em maxiter=200 out=emdata; EM algorithm, MLE of missing data • freq fweight; weighs observations by frequency weight • mcmc (options); modify imputation method • class sex race; specify categorical variables (don’t need dummies) (new / experimental)
Regression • Fit your model as if data had no missing values, using by _imputation_; • proc reg data=impdat outest=parmcov covout; model drivesfast=sex black other age; by _imputation_; run; • You’ll get nimpute (usually 5) sets of output • Estimates, covariances, errors will be combined in MIANALYZE (R² is just mean) • Need to generate parameter estimates and covariance data set (varies by procedure)
Parameter Est. & Covariance Matrix • proc logistic data=impdat descending; model drivesfast=sex black other age /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; • proc mixed data=impdat; model drivesfast=sex black other age /solution covb; by _imputation_; ods output covparms=parmcov; run;
Parameter Est. & Covariance Matrix • proc genmod data=impdat; model drivesfast=sex black other age /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; • proc glm data=impdat; model drivesfast=sex black other age /inverse; by _imputation_; ods output ParameterEstimates=parmsdat InvXPX=xpxidat; run;
PROC MIANALYZE • Syntax depends on what procedure you used in previous step: • proc mianalyze data=parmcov; or proc mianalyze parms=parmsdat covb=covbdat; or proc mianalyze parms=parmsdat xpxi=xpxidat; modeleffects intercept sex black other age; run; • Note the “var” statement is now “modeleffects” • Note that the dependent variable is omitted