220 likes | 444 Views
Practical Missing Data Analysis in SPSS (v17 onwards). Peter T. Donnan Professor of Epidemiology and Biostatistics. Objectives. How to impute missing values in SPSS, specifically MI How to implement analyses with multiple imputed values Interpretation of the output Practical tips.
E N D
Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics
Objectives • How to impute missing values in SPSS, specifically MI • How to implement analyses with multiple imputed values • Interpretation of the output • Practical tips
Example data From trial of pedometers+advicevs advice vs controls in sedentary elderly women Follow-up at 3 and 6 mnths Main outcome measure of activity from accelerometer counts 210 randomised / 170 at 3 months
Example data – Pedometer trial Read in data ‘SPSS Study databse.sav’ Main outcome is: 3 mnth activity – AccelVM2 Baseline activity – AccelVM1a Trial arm represented by two dummy variables: Grp1 = Pedom. Vs. control Grp2 = Advice vs. control
Main analysis – Pedometer trial Regression on 3 months activity adjusting for baseline activity and two dummy variables representing trial arm contrasts
Main analysis – Pedometer trial Note that n =170 with 40 missing in complete case analysis and so potential for bias
Missing at Random (MAR) Prob (Missing) is independent of: 1) unobserved data but 2) dependent on observed data Essentially observed data is a random sample of full data in each stratum MAR is weaker version of MCAR assumption If MAR is assumed, many methods possible to impute data using observed data.
Execution of MI in SPSS So assuming MAR we can use the available data to predict missing values in SPSS: Analyze Multiple Imputation Impute Missing Data Values
Execution of MI in SPSS Enter ALL variables you think associated with missingness Note default imputation number = 5 Create new dataset to store results Note icon indicating procedures that allow MI analysis
Execution of MI in SPSS Automatic method lets SPSS chose Custom gives more flexibility Can include all 2-way interactions Linear Regression model prediction
Execution of MI in SPSS List of variables chosen Define Each variable for imputation or predictor or BOTH N.b. Recommend including the OUTCOME as both predictor and outcome
Output of MI in SPSS Note main interest in outcome VM2 but other factors with missing values also imputed
Step 2 - Using Imputed datasets in analysis Note new dataset has IMPUTATION number as first column and contains in order the original dataset (n = 210), IMPUTATION = 0 and concatenated below it a further 5 new datasets (each n = 210) but now with imputed values, IMPUTATION = 1 to 5 Most analyses can now be implemented if the fossil shell spiral symbol is present
Repeat Main analysis – Need Pooled Results Procedure exactly same as before SPSS will do the pooled analysis if the icon (above) is present in the drop-down menu
Pooled Analysis in SPSS Results presented for the original data and for each imputed dataset separately
Results of pooled analysis from 5 imputed datasets Larger effect sizes in both groups Greater power gives more significance
Interpretation Compare pooled results with the original as a form of sensitivity analysis If results similar suggests the original results fairly robust Consider whether MAR is reasonable assumption Consider whether you have included all factors (including the outcome) related to the missingness in the imputation model as a crucial assumption
Summary • SPSS now includes Multiple imputation in its armoury • Consider assumptions of MI • Compare results under different assumption to assess robustness of results • If MAR assumption o.k. then MI provides results that are less biased than complete case analysis