190 likes | 294 Views
Conditions of applications. Key concepts. Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots Residuals vs. fitted QQ plots Cook’s distance. Conditions of applications.
E N D
Key concepts • Testing conditions of applications in complex study design • Residuals • Tests of normality • Residuals plots • Residuals vs. fitted • QQ plots • Cook’s distance
Conditions of applications • RM ANOVA and multilevel modeling have 2 conditions of application in common: • Normality of the DV by cell of the IV • Few outliers • Homoscedasticity (equality of variance) • (Linearity: trivial in ANOVA since we only estimate mean differences)
Problems with checking normality by cell • Number of cells grow with number of IV • What about continuous IV • How to deal with number of tests
Problems with checking homoscedasticity by pair of cells • Number of cells grow with number of IV • What about continuous IV • How to deal with number of tests
Residuals: definition • Yi = b0 + b1X + e • Thus, • Where e are the residuals, and correspond to the distance between the observed value and the best predicted value
Residuals: what to look for • Residuals should have a normal distribution across (or irrespective of) groups since differences in IV have been subtracted. • Residuals should have equal variances, similarly to observed DV by cell • There should be no remaining structure in the residuals (allow to check for linearity
Normality tests • Many normality tests exist. By order of type I and type II error: • Shapiro-Wilk: • Where a depends on the parameters of a normal distribution and xi are the value of x from the smallest to the largest • Anderson-Darling: same idea of ordering data • Kolmogorov-smirnov • …
But… • All of these tests are known to be incorrect. • When data are in fact from a normal distribution, they reject the null too often or too rarely • When data are in fact not from a normal distribution, they do not reject the null often enough (low power)
Residual plots: residuals vs. fitted or vs. each IV • Scatterplot of the predicted values (Yi hat) against the residuals or against each IV. • There are different versions of this type of plot (e.g., residuals can be divided by their estimated standard deviation or not) • They allow to examine • homoscedasticity, • Linearity of relationship between IV and DV, • Normality of residuals (should have ellipsoid shape), • outliers
Residual plot: Quantile-Quantileplot • Graphical method for comparing two probability distributions • Compare the quantiles of the normal distribution with mean 0 and variance s2 to the values (ordered) of the residuals • All the points should align on the diagonal from bottom left to top right
Outliers • Outliers are extreme values either on the IV or on the DV or both. • Leverage observations are extreme on the X-axis (IV). But may not influence too much the estimation of the parameters. • Influential observations are extreme on the X and Y axes, and influence greatly the estimation of the parameters
Cook’s distances Where Yj are the predicted values of Y, and Yj(i) are the predicted values of Y if observation i was removed and the model was estimated again. p is the number of parameters of the model and MSE is the mean square error. Cutoff: 1 or 4/n or Fp,n-p
An example of a residual analysis • Back to autism data again. • Step 1: obtain the residualsuse the option save in the mixed linear model • Step 2: check normality (analysisexplore) • Step 3: look at residuals plot • Residuals vs fitted • Residuals vs time • (Standardized residuals vs fitted) • QQ plots • (Cook’s distance)